AI Agent Glossary: 50 Terms Defined for Engineers

Core AI Agent Concepts

AI Agent

A software system that perceives its environment, reasons about goals, and takes autonomous actions using LLMs.

Autonomous Agent

An agent that operates without continuous human oversight, making decisions and taking actions independently.

Agentic Workflow

A multi-step process where an LLM orchestrates tools, memory, and reasoning to complete complex tasks.

Tool Use

The ability of an LLM to call external functions, APIs, or services as part of its reasoning process.

Reasoning Engine

The LLM component responsible for interpreting inputs, forming plans, and generating decisions in an agent system.

Agent Loop

The observe-reason-act cycle that an agent iterates through until a task is completed or a stopping condition is met.

Orchestration

The coordination of multiple agents, tools, and data sources to execute a complex multi-step workflow.

Guardrails

Validation and safety checks applied to agent inputs and outputs to prevent harmful, incorrect, or out-of-scope behavior.

Architecture & Design Patterns

Directed Acyclic Graph

A graph structure where nodes represent agent steps and directed edges define execution order with no cycles.

State Machine

A computational model that defines an agent's possible states and the transitions between them, ensuring deterministic behavior.

Retrieval-Augmented Generation

An architecture that grounds LLM responses in retrieved source documents, reducing hallucination and improving accuracy.

Multi-Agent System

An architecture where multiple specialized agents collaborate, each handling a subset of a complex task.

Chain of Thought

A prompting technique that elicits step-by-step reasoning from an LLM before producing a final answer.

ReAct Pattern

A framework combining Reasoning and Acting: the LLM reasons about what to do, acts via tools, then observes results.

Human-in-the-Loop

A design pattern where human review and approval is required at designated checkpoints in an agent workflow.

Tool Calling

The mechanism by which an LLM selects and invokes external functions with structured arguments during inference.

LLM Technology

Large Language Model

A neural network trained on large text corpora to generate and understand natural language, serving as the core of AI agents.

Tokenization

The process of converting text into numerical tokens for LLM processing. Token count determines cost and context limits.

Context Window

The maximum number of tokens an LLM can process in a single inference call, defining the agent's working memory limit.

Temperature

A sampling parameter that controls output randomness. Temperature 0 produces deterministic outputs; higher values increase variability.

Inference

The process of running a trained model to generate predictions or outputs from new inputs. The production-time computation.

Fine-Tuning

Additional training of a pre-trained LLM on domain-specific data to improve performance on target tasks or adopt specific behaviors.

Prompt Engineering

The practice of designing and optimizing LLM input prompts to reliably elicit desired outputs and behaviors.

Memory & Storage

Vector Database

A database optimized for storing and querying high-dimensional embedding vectors, enabling semantic similarity search.

Embedding

A dense numerical vector representation of text that captures semantic meaning, enabling similarity comparisons.

Semantic Search

Retrieval based on meaning and context rather than keyword matching, using embedding similarity to find relevant content.

Episodic Memory

An agent's stored record of past interactions and experiences, retrievable to inform future decisions.

Working Memory

The in-context information available to an agent during a single inference call, bounded by the context window.

Knowledge Graph

A structured representation of entities and their relationships, enabling precise fact retrieval and multi-hop reasoning.

Chunking

The process of splitting documents into smaller segments for indexing and retrieval in RAG systems.

Deployment & Infrastructure

On-Premise LLM

An LLM deployed within an organization's own infrastructure, keeping model weights and data under direct control.

Model Serving

The infrastructure layer that exposes trained models as APIs, handling batching, scaling, and request routing.

Inference Endpoint

An API endpoint that accepts inputs and returns model predictions, serving as the interface for agent-model interaction.

Latency

The time from request to response in an LLM call. Time-to-first-token (TTFT) and total generation time are key metrics.

Throughput

The number of tokens or requests a model serving system can process per unit of time, measured in tokens/second.

GPU Compute

Graphics Processing Units used for LLM inference and training due to their parallel processing architecture.

Model Quantization

A compression technique that reduces model precision (e.g., float32 to int8) to decrease memory usage and increase inference speed.

Security & Compliance

Prompt Injection

An attack where malicious input overrides an agent's system prompt, hijacking its behavior or extracting sensitive information.

Data Privacy

Policies and controls ensuring that sensitive information processed by AI agents is protected and handled per regulations.

Audit Trail

An immutable log of agent decisions, inputs, and outputs enabling accountability, debugging, and regulatory compliance.

Role-Based Access Control

A security model that restricts agent capabilities and data access based on the roles of users and systems.

PII Detection

Automated identification of Personally Identifiable Information in agent inputs/outputs to prevent data leakage.

Hallucination

When an LLM generates confident-sounding but factually incorrect or fabricated information not grounded in its context.

Jailbreaking

Techniques used to bypass an LLM's safety constraints or system prompt restrictions to elicit prohibited outputs.

Evaluation & Testing

Evals

Structured evaluation frameworks for measuring LLM output quality, accuracy, and safety across a defined test set.

Benchmark

A standardized test suite used to compare model or agent performance across consistent tasks and metrics.

Determinism

The property of producing identical outputs for identical inputs. Critical for auditable, testable production AI agents.

Regression Testing

Testing that verifies a change (new prompt, model, or code) doesn't degrade previously working agent behaviors.

Confidence Score

A numerical estimate of an agent's certainty in its output, used to route low-confidence decisions to human review.

Ground Truth

Verified correct answers used as the reference for evaluating agent output quality during testing and evaluation.

AI Agent Glossary