AI Agent Glossary
50 terms precisely defined for engineers building production AI systems. No marketing fluff—just engineering accuracy.
Core AI Agent Concepts
A software system that perceives its environment, reasons about goals, and takes autonomous actions using LLMs.
An agent that operates without continuous human oversight, making decisions and taking actions independently.
A multi-step process where an LLM orchestrates tools, memory, and reasoning to complete complex tasks.
The ability of an LLM to call external functions, APIs, or services as part of its reasoning process.
The LLM component responsible for interpreting inputs, forming plans, and generating decisions in an agent system.
The observe-reason-act cycle that an agent iterates through until a task is completed or a stopping condition is met.
The coordination of multiple agents, tools, and data sources to execute a complex multi-step workflow.
Validation and safety checks applied to agent inputs and outputs to prevent harmful, incorrect, or out-of-scope behavior.
Architecture & Design Patterns
A graph structure where nodes represent agent steps and directed edges define execution order with no cycles.
A computational model that defines an agent's possible states and the transitions between them, ensuring deterministic behavior.
An architecture that grounds LLM responses in retrieved source documents, reducing hallucination and improving accuracy.
An architecture where multiple specialized agents collaborate, each handling a subset of a complex task.
A prompting technique that elicits step-by-step reasoning from an LLM before producing a final answer.
A framework combining Reasoning and Acting: the LLM reasons about what to do, acts via tools, then observes results.
A design pattern where human review and approval is required at designated checkpoints in an agent workflow.
The mechanism by which an LLM selects and invokes external functions with structured arguments during inference.
LLM Technology
A neural network trained on large text corpora to generate and understand natural language, serving as the core of AI agents.
The process of converting text into numerical tokens for LLM processing. Token count determines cost and context limits.
The maximum number of tokens an LLM can process in a single inference call, defining the agent's working memory limit.
A sampling parameter that controls output randomness. Temperature 0 produces deterministic outputs; higher values increase variability.
The process of running a trained model to generate predictions or outputs from new inputs. The production-time computation.
Additional training of a pre-trained LLM on domain-specific data to improve performance on target tasks or adopt specific behaviors.
The practice of designing and optimizing LLM input prompts to reliably elicit desired outputs and behaviors.
Memory & Storage
A database optimized for storing and querying high-dimensional embedding vectors, enabling semantic similarity search.
A dense numerical vector representation of text that captures semantic meaning, enabling similarity comparisons.
Retrieval based on meaning and context rather than keyword matching, using embedding similarity to find relevant content.
An agent's stored record of past interactions and experiences, retrievable to inform future decisions.
The in-context information available to an agent during a single inference call, bounded by the context window.
A structured representation of entities and their relationships, enabling precise fact retrieval and multi-hop reasoning.
The process of splitting documents into smaller segments for indexing and retrieval in RAG systems.
Deployment & Infrastructure
An LLM deployed within an organization's own infrastructure, keeping model weights and data under direct control.
The infrastructure layer that exposes trained models as APIs, handling batching, scaling, and request routing.
An API endpoint that accepts inputs and returns model predictions, serving as the interface for agent-model interaction.
The time from request to response in an LLM call. Time-to-first-token (TTFT) and total generation time are key metrics.
The number of tokens or requests a model serving system can process per unit of time, measured in tokens/second.
Graphics Processing Units used for LLM inference and training due to their parallel processing architecture.
A compression technique that reduces model precision (e.g., float32 to int8) to decrease memory usage and increase inference speed.
Security & Compliance
An attack where malicious input overrides an agent's system prompt, hijacking its behavior or extracting sensitive information.
Policies and controls ensuring that sensitive information processed by AI agents is protected and handled per regulations.
An immutable log of agent decisions, inputs, and outputs enabling accountability, debugging, and regulatory compliance.
A security model that restricts agent capabilities and data access based on the roles of users and systems.
Automated identification of Personally Identifiable Information in agent inputs/outputs to prevent data leakage.
When an LLM generates confident-sounding but factually incorrect or fabricated information not grounded in its context.
Techniques used to bypass an LLM's safety constraints or system prompt restrictions to elicit prohibited outputs.
Evaluation & Testing
Structured evaluation frameworks for measuring LLM output quality, accuracy, and safety across a defined test set.
A standardized test suite used to compare model or agent performance across consistent tasks and metrics.
The property of producing identical outputs for identical inputs. Critical for auditable, testable production AI agents.
Testing that verifies a change (new prompt, model, or code) doesn't degrade previously working agent behaviors.
A numerical estimate of an agent's certainty in its output, used to route low-confidence decisions to human review.
Verified correct answers used as the reference for evaluating agent output quality during testing and evaluation.
Building a production AI system?
We engineer deterministic AI agents for enterprise teams. From architecture to deployment.
Start Assessment