S

AI Glossary

Comprehensive glossary of artificial intelligence and machine learning terms. Understand the key concepts behind open-source AI models.

ApplicationsArchitectureCapabilitiesChallengesConceptsEvaluationFormatsHardwareInferenceInfrastructureModelsOperationsOptimizationParametersPerformanceProcessingResearchSafetySecurityTechniquesTrainingUsage

Activation Function

Architecture

A mathematical function applied to neuron outputs that introduces non-linearity, enabling networks to learn complex patterns.

Adversarial Examples

Security

Inputs deliberately designed to cause a model to make mistakes, used for testing robustness and security.

Agent

Concepts

An AI system that can perceive its environment, make decisions, and take actions to achieve specific goals autonomously.

Alignment

Safety

The process of ensuring AI systems behave in accordance with human values and intentions, crucial for safety.

Attention Mechanism

Architecture

A technique that allows models to focus on specific parts of the input when processing information, crucial for transformer architectures.

Autoencoder

Architecture

A neural network that learns to compress data into a lower-dimensional representation and reconstruct it.

Autoregressive Model

Architecture

A model that generates outputs sequentially, where each new token depends on previously generated tokens.

Backpropagation

Training

An algorithm for training neural networks by computing gradients of the loss function with respect to weights.

Batch Size

Training

The number of training examples processed together in one iteration, affecting training speed and memory usage.

Beam Search

Inference

A decoding strategy that explores multiple possible sequences simultaneously to find high-quality outputs.

BERT

Models

Bidirectional Encoder Representations from Transformers - a pre-training method for NLP that reads text bidirectionally.

Bias (Model)

Challenges

Systematic errors in model predictions, often reflecting biases in training data or model architecture.

BLEU Score

Evaluation

A metric for evaluating machine translation quality by comparing generated text to reference translations.

Catastrophic Forgetting

Challenges

When a model loses previously learned knowledge while learning new information, a challenge in continual learning.

Chain-of-Thought

Techniques

A prompting technique that encourages models to show step-by-step reasoning, improving complex problem-solving.

Checkpoint

Training

A saved snapshot of model weights during training, allowing resumption or evaluation at specific points.

CLIP

Models

Contrastive Language-Image Pre-training - a model that learns visual concepts from natural language supervision.

Context Window

Capabilities

The maximum number of tokens a model can process at once. Larger context windows allow understanding longer documents.

Contrastive Learning

Training

A training approach that learns representations by contrasting similar and dissimilar examples.

Convolutional Neural Network (CNN)

Architecture

A neural network architecture designed for processing grid-like data such as images, using convolutional layers.

Cross-Attention

Architecture

An attention mechanism where queries from one sequence attend to keys and values from another sequence.

Data Augmentation

Training

Techniques for artificially expanding training datasets by creating modified versions of existing examples.

Decoder

Architecture

The component of a model that generates output sequences, often used in transformer architectures.

Diffusion Model

Architecture

A generative model that learns to reverse a gradual noising process, used in image generation like Stable Diffusion.

Distillation

Optimization

Training a smaller 'student' model to mimic a larger 'teacher' model, creating efficient compressed models.

Dropout

Training

A regularization technique that randomly deactivates neurons during training to prevent overfitting.

Embeddings

Concepts

Vector representations of text, images, or other data that capture semantic meaning in a continuous space.

Encoder

Architecture

The component of a model that processes input sequences into representations, often used in transformers.

Epoch

Training

One complete pass through the entire training dataset during the training process.

Few-shot Learning

Capabilities

A model's ability to learn new tasks from just a few examples, demonstrated through in-context learning.

Fine-tuning

Training

The process of adapting a pre-trained model to a specific task or domain by training on specialized data.

FLOPS

Hardware

Floating Point Operations Per Second - a measure of computational performance for AI hardware.

Foundation Model

Concepts

Large-scale models trained on broad data that can be adapted to many downstream tasks through fine-tuning.

GAN

Architecture

Generative Adversarial Network - two neural networks competing to generate realistic synthetic data.

GGUF

Formats

GPT-Generated Unified Format - a file format for storing quantized language models efficiently.

GPT

Models

Generative Pre-trained Transformer - an autoregressive language model architecture that generates text left-to-right.

Gradient Descent

Training

An optimization algorithm that iteratively adjusts model parameters to minimize the loss function.

Greedy Decoding

Inference

A simple generation strategy that always selects the most probable next token at each step.

Hallucination

Challenges

When an AI model generates false or nonsensical information presented as fact, a common challenge in LLMs.

Hyperparameter

Training

Configuration settings for training (like learning rate) that are set before training begins, not learned.

In-Context Learning

Capabilities

A model's ability to learn from examples provided in the prompt without updating its parameters.

Inference

Operations

The process of using a trained model to make predictions or generate outputs on new data.

Instruction Tuning

Training

Fine-tuning models on instruction-following datasets to improve their ability to follow user commands.

Jailbreaking

Security

Techniques to bypass safety guardrails in AI models, a security concern for deployed systems.

Knowledge Distillation

Optimization

See Distillation - transferring knowledge from a large model to a smaller, more efficient one.

KYI (Know Your Intelligence)

Evaluation

A comprehensive benchmarking framework that evaluates AI models across multiple dimensions beyond traditional metrics.

Latency

Performance

The time delay between input and output in AI systems, critical for real-time applications.

Layer Normalization

Architecture

A technique that normalizes activations across features to stabilize and accelerate training.

Learning Rate

Training

A hyperparameter controlling how much model weights are adjusted during training, crucial for convergence.

LLaMA

Models

Large Language Model Meta AI - Meta's family of open-source language models ranging from 7B to 405B parameters.

LLM

Concepts

Large Language Model - neural networks with billions of parameters trained on vast text corpora.

LoRA

Training

Low-Rank Adaptation - an efficient fine-tuning method that updates only a small subset of model parameters.

Loss Function

Training

A mathematical function measuring the difference between model predictions and actual values, guiding training.

LSTM

Architecture

Long Short-Term Memory - a recurrent neural network architecture designed to handle long-term dependencies.

Masked Language Modeling

Training

A pre-training objective where models predict masked tokens in text, used in BERT-style models.

Mixture of Experts (MoE)

Architecture

An architecture where multiple specialized sub-models (experts) are combined, with routing determining which experts process each input.

Model Compression

Optimization

Techniques like quantization and pruning to reduce model size and computational requirements.

Multi-Head Attention

Architecture

An attention mechanism with multiple parallel attention layers, allowing models to focus on different aspects.

Multimodal

Capabilities

Models that can process and generate multiple types of data (text, images, audio, video) in a unified framework.

Neural Architecture Search

Research

Automated methods for discovering optimal neural network architectures for specific tasks.

Normalization

Training

Techniques for scaling data or activations to improve training stability and convergence speed.

Nucleus Sampling

Inference

A generation strategy that samples from the smallest set of tokens whose cumulative probability exceeds a threshold.

One-shot Learning

Capabilities

Learning to perform a task from a single example, a special case of few-shot learning.

Overfitting

Challenges

When a model learns training data too well, including noise, resulting in poor generalization to new data.

Parameters

Architecture

The learnable weights in a neural network. More parameters generally mean more capacity but require more compute.

Perplexity

Evaluation

A metric measuring how well a language model predicts text, with lower values indicating better performance.

Positional Encoding

Architecture

Information added to embeddings to represent token positions in sequences, crucial for transformers.

Pre-training

Training

Initial training phase on large datasets to learn general representations before task-specific fine-tuning.

Prompt Engineering

Usage

The practice of crafting effective input prompts to elicit desired outputs from language models.

Pruning

Optimization

Removing unnecessary weights or neurons from a model to reduce size and improve efficiency.

Quantization

Optimization

Reducing the precision of model weights (e.g., from 32-bit to 8-bit) to decrease memory usage and increase speed.

RAG

Techniques

Retrieval-Augmented Generation - combining language models with external knowledge retrieval for more accurate responses.

Recurrent Neural Network (RNN)

Architecture

A neural network architecture designed for sequential data, processing inputs one step at a time.

Reinforcement Learning from Human Feedback (RLHF)

Training

Training approach using human preferences to align model behavior with desired outcomes.

Residual Connection

Architecture

Skip connections that add layer inputs to outputs, enabling training of very deep networks.

ROUGE Score

Evaluation

Metrics for evaluating text summarization by comparing generated summaries to reference summaries.

Sampling

Inference

Methods for selecting tokens during generation, balancing between randomness and quality.

Self-Attention

Architecture

A mechanism where each position in a sequence attends to all positions, enabling parallel processing in transformers.

Semantic Search

Applications

Search based on meaning rather than keywords, typically using embeddings and vector similarity.

Sequence-to-Sequence

Architecture

Models that transform input sequences into output sequences, used for translation and summarization.

Softmax

Architecture

An activation function that converts logits into probability distributions over classes.

Stable Diffusion

Models

An open-source text-to-image diffusion model capable of generating high-quality images from text descriptions.

Supervised Learning

Training

Training approach using labeled data where the model learns to map inputs to known outputs.

Temperature

Parameters

A parameter controlling randomness in generation. Higher values increase creativity, lower values increase determinism.

Tensor

Concepts

A multi-dimensional array used to represent data and parameters in neural networks.

Token

Processing

The basic unit of text processing in language models, typically representing words or subwords.

Tokenization

Processing

The process of breaking text into smaller units (tokens) that models can process, typically words or subwords.

Top-k Sampling

Inference

A generation strategy that samples from the k most probable next tokens at each step.

Top-p Sampling

Inference

See Nucleus Sampling - samples from tokens whose cumulative probability exceeds p.

Transfer Learning

Training

Applying knowledge learned from one task to improve performance on a different but related task.

Transformer

Architecture

A neural network architecture based on self-attention mechanisms, foundational to modern LLMs and vision models.

Underfitting

Challenges

When a model is too simple to capture patterns in data, resulting in poor performance on both training and test data.

Unsupervised Learning

Training

Training approach using unlabeled data where models discover patterns without explicit supervision.

Validation Set

Training

A subset of data used to evaluate model performance during training and tune hyperparameters.

Variational Autoencoder (VAE)

Architecture

A generative model that learns probabilistic latent representations of data.

Vector Database

Infrastructure

A database optimized for storing and querying high-dimensional vectors, essential for semantic search and RAG.

Vision Transformer (ViT)

Architecture

A transformer architecture adapted for computer vision tasks, treating image patches as tokens.

Warmup

Training

Gradually increasing the learning rate at the start of training to stabilize optimization.

Weight Decay

Training

A regularization technique that penalizes large weights to prevent overfitting.

Whisper

Models

OpenAI's open-source speech recognition model trained on 680,000 hours of multilingual audio data.

Zero-shot Learning

Capabilities

A model's ability to perform tasks it wasn't explicitly trained on, using only natural language instructions.