AI Glossary
Comprehensive glossary of artificial intelligence and machine learning terms. Understand the key concepts behind open-source AI models.
Activation Function
ArchitectureA mathematical function applied to neuron outputs that introduces non-linearity, enabling networks to learn complex patterns.
Adversarial Examples
SecurityInputs deliberately designed to cause a model to make mistakes, used for testing robustness and security.
Agent
ConceptsAn AI system that can perceive its environment, make decisions, and take actions to achieve specific goals autonomously.
Alignment
SafetyThe process of ensuring AI systems behave in accordance with human values and intentions, crucial for safety.
Attention Mechanism
ArchitectureA technique that allows models to focus on specific parts of the input when processing information, crucial for transformer architectures.
Autoencoder
ArchitectureA neural network that learns to compress data into a lower-dimensional representation and reconstruct it.
Autoregressive Model
ArchitectureA model that generates outputs sequentially, where each new token depends on previously generated tokens.
Backpropagation
TrainingAn algorithm for training neural networks by computing gradients of the loss function with respect to weights.
Batch Size
TrainingThe number of training examples processed together in one iteration, affecting training speed and memory usage.
Beam Search
InferenceA decoding strategy that explores multiple possible sequences simultaneously to find high-quality outputs.
BERT
ModelsBidirectional Encoder Representations from Transformers - a pre-training method for NLP that reads text bidirectionally.
Bias (Model)
ChallengesSystematic errors in model predictions, often reflecting biases in training data or model architecture.
BLEU Score
EvaluationA metric for evaluating machine translation quality by comparing generated text to reference translations.
Catastrophic Forgetting
ChallengesWhen a model loses previously learned knowledge while learning new information, a challenge in continual learning.
Chain-of-Thought
TechniquesA prompting technique that encourages models to show step-by-step reasoning, improving complex problem-solving.
Checkpoint
TrainingA saved snapshot of model weights during training, allowing resumption or evaluation at specific points.
CLIP
ModelsContrastive Language-Image Pre-training - a model that learns visual concepts from natural language supervision.
Context Window
CapabilitiesThe maximum number of tokens a model can process at once. Larger context windows allow understanding longer documents.
Contrastive Learning
TrainingA training approach that learns representations by contrasting similar and dissimilar examples.
Convolutional Neural Network (CNN)
ArchitectureA neural network architecture designed for processing grid-like data such as images, using convolutional layers.
Cross-Attention
ArchitectureAn attention mechanism where queries from one sequence attend to keys and values from another sequence.
Data Augmentation
TrainingTechniques for artificially expanding training datasets by creating modified versions of existing examples.
Decoder
ArchitectureThe component of a model that generates output sequences, often used in transformer architectures.
Diffusion Model
ArchitectureA generative model that learns to reverse a gradual noising process, used in image generation like Stable Diffusion.
Distillation
OptimizationTraining a smaller 'student' model to mimic a larger 'teacher' model, creating efficient compressed models.
Dropout
TrainingA regularization technique that randomly deactivates neurons during training to prevent overfitting.
Embeddings
ConceptsVector representations of text, images, or other data that capture semantic meaning in a continuous space.
Encoder
ArchitectureThe component of a model that processes input sequences into representations, often used in transformers.
Epoch
TrainingOne complete pass through the entire training dataset during the training process.
Few-shot Learning
CapabilitiesA model's ability to learn new tasks from just a few examples, demonstrated through in-context learning.
Fine-tuning
TrainingThe process of adapting a pre-trained model to a specific task or domain by training on specialized data.
FLOPS
HardwareFloating Point Operations Per Second - a measure of computational performance for AI hardware.
Foundation Model
ConceptsLarge-scale models trained on broad data that can be adapted to many downstream tasks through fine-tuning.
GAN
ArchitectureGenerative Adversarial Network - two neural networks competing to generate realistic synthetic data.
GGUF
FormatsGPT-Generated Unified Format - a file format for storing quantized language models efficiently.
GPT
ModelsGenerative Pre-trained Transformer - an autoregressive language model architecture that generates text left-to-right.
Gradient Descent
TrainingAn optimization algorithm that iteratively adjusts model parameters to minimize the loss function.
Greedy Decoding
InferenceA simple generation strategy that always selects the most probable next token at each step.
Hallucination
ChallengesWhen an AI model generates false or nonsensical information presented as fact, a common challenge in LLMs.
Hyperparameter
TrainingConfiguration settings for training (like learning rate) that are set before training begins, not learned.
In-Context Learning
CapabilitiesA model's ability to learn from examples provided in the prompt without updating its parameters.
Inference
OperationsThe process of using a trained model to make predictions or generate outputs on new data.
Instruction Tuning
TrainingFine-tuning models on instruction-following datasets to improve their ability to follow user commands.
Jailbreaking
SecurityTechniques to bypass safety guardrails in AI models, a security concern for deployed systems.
Knowledge Distillation
OptimizationSee Distillation - transferring knowledge from a large model to a smaller, more efficient one.
KYI (Know Your Intelligence)
EvaluationA comprehensive benchmarking framework that evaluates AI models across multiple dimensions beyond traditional metrics.
Latency
PerformanceThe time delay between input and output in AI systems, critical for real-time applications.
Layer Normalization
ArchitectureA technique that normalizes activations across features to stabilize and accelerate training.
Learning Rate
TrainingA hyperparameter controlling how much model weights are adjusted during training, crucial for convergence.
LLaMA
ModelsLarge Language Model Meta AI - Meta's family of open-source language models ranging from 7B to 405B parameters.
LLM
ConceptsLarge Language Model - neural networks with billions of parameters trained on vast text corpora.
LoRA
TrainingLow-Rank Adaptation - an efficient fine-tuning method that updates only a small subset of model parameters.
Loss Function
TrainingA mathematical function measuring the difference between model predictions and actual values, guiding training.
LSTM
ArchitectureLong Short-Term Memory - a recurrent neural network architecture designed to handle long-term dependencies.
Masked Language Modeling
TrainingA pre-training objective where models predict masked tokens in text, used in BERT-style models.
Mixture of Experts (MoE)
ArchitectureAn architecture where multiple specialized sub-models (experts) are combined, with routing determining which experts process each input.
Model Compression
OptimizationTechniques like quantization and pruning to reduce model size and computational requirements.
Multi-Head Attention
ArchitectureAn attention mechanism with multiple parallel attention layers, allowing models to focus on different aspects.
Multimodal
CapabilitiesModels that can process and generate multiple types of data (text, images, audio, video) in a unified framework.
Neural Architecture Search
ResearchAutomated methods for discovering optimal neural network architectures for specific tasks.
Normalization
TrainingTechniques for scaling data or activations to improve training stability and convergence speed.
Nucleus Sampling
InferenceA generation strategy that samples from the smallest set of tokens whose cumulative probability exceeds a threshold.
One-shot Learning
CapabilitiesLearning to perform a task from a single example, a special case of few-shot learning.
Overfitting
ChallengesWhen a model learns training data too well, including noise, resulting in poor generalization to new data.
Parameters
ArchitectureThe learnable weights in a neural network. More parameters generally mean more capacity but require more compute.
Perplexity
EvaluationA metric measuring how well a language model predicts text, with lower values indicating better performance.
Positional Encoding
ArchitectureInformation added to embeddings to represent token positions in sequences, crucial for transformers.
Pre-training
TrainingInitial training phase on large datasets to learn general representations before task-specific fine-tuning.
Prompt Engineering
UsageThe practice of crafting effective input prompts to elicit desired outputs from language models.
Pruning
OptimizationRemoving unnecessary weights or neurons from a model to reduce size and improve efficiency.
Quantization
OptimizationReducing the precision of model weights (e.g., from 32-bit to 8-bit) to decrease memory usage and increase speed.
RAG
TechniquesRetrieval-Augmented Generation - combining language models with external knowledge retrieval for more accurate responses.
Recurrent Neural Network (RNN)
ArchitectureA neural network architecture designed for sequential data, processing inputs one step at a time.
Reinforcement Learning from Human Feedback (RLHF)
TrainingTraining approach using human preferences to align model behavior with desired outcomes.
Residual Connection
ArchitectureSkip connections that add layer inputs to outputs, enabling training of very deep networks.
ROUGE Score
EvaluationMetrics for evaluating text summarization by comparing generated summaries to reference summaries.
Sampling
InferenceMethods for selecting tokens during generation, balancing between randomness and quality.
Self-Attention
ArchitectureA mechanism where each position in a sequence attends to all positions, enabling parallel processing in transformers.
Semantic Search
ApplicationsSearch based on meaning rather than keywords, typically using embeddings and vector similarity.
Sequence-to-Sequence
ArchitectureModels that transform input sequences into output sequences, used for translation and summarization.
Softmax
ArchitectureAn activation function that converts logits into probability distributions over classes.
Stable Diffusion
ModelsAn open-source text-to-image diffusion model capable of generating high-quality images from text descriptions.
Supervised Learning
TrainingTraining approach using labeled data where the model learns to map inputs to known outputs.
Temperature
ParametersA parameter controlling randomness in generation. Higher values increase creativity, lower values increase determinism.
Tensor
ConceptsA multi-dimensional array used to represent data and parameters in neural networks.
Token
ProcessingThe basic unit of text processing in language models, typically representing words or subwords.
Tokenization
ProcessingThe process of breaking text into smaller units (tokens) that models can process, typically words or subwords.
Top-k Sampling
InferenceA generation strategy that samples from the k most probable next tokens at each step.
Top-p Sampling
InferenceSee Nucleus Sampling - samples from tokens whose cumulative probability exceeds p.
Transfer Learning
TrainingApplying knowledge learned from one task to improve performance on a different but related task.
Transformer
ArchitectureA neural network architecture based on self-attention mechanisms, foundational to modern LLMs and vision models.
Underfitting
ChallengesWhen a model is too simple to capture patterns in data, resulting in poor performance on both training and test data.
Unsupervised Learning
TrainingTraining approach using unlabeled data where models discover patterns without explicit supervision.
Validation Set
TrainingA subset of data used to evaluate model performance during training and tune hyperparameters.
Variational Autoencoder (VAE)
ArchitectureA generative model that learns probabilistic latent representations of data.
Vector Database
InfrastructureA database optimized for storing and querying high-dimensional vectors, essential for semantic search and RAG.
Vision Transformer (ViT)
ArchitectureA transformer architecture adapted for computer vision tasks, treating image patches as tokens.
Warmup
TrainingGradually increasing the learning rate at the start of training to stabilize optimization.
Weight Decay
TrainingA regularization technique that penalizes large weights to prevent overfitting.
Whisper
ModelsOpenAI's open-source speech recognition model trained on 680,000 hours of multilingual audio data.
Zero-shot Learning
CapabilitiesA model's ability to perform tasks it wasn't explicitly trained on, using only natural language instructions.