Security & Compliance

Prompt Injection

An attack vector where malicious content in agent inputs overrides system prompt instructions, hijacking agent behavior or extracting sensitive information such as system prompts or API keys.

Definition

Prompt injection exploits the LLM's inability to distinguish between legitimate instructions from developers and adversarial instructions injected through untrusted input channels. Because LLMs process system prompts and user inputs as a unified token stream, crafted user messages or retrieved document content can override prior instructions, change the agent's goals, or cause it to perform actions outside its intended scope. This makes prompt injection fundamentally different from traditional SQL injection—it requires no code execution, only carefully worded text.

Engineering Context

Prompt injection is the OWASP #1 vulnerability for LLM applications. Two variants: direct injection (user directly tries to override instructions) and indirect injection (malicious content in retrieved documents or tool results). Mitigations: input validation before LLM call, output validation after, privileged/unprivileged context separation, and never storing secrets in system prompts. Structural defenses include instructing the model to treat user input as data rather than instructions, using XML/JSON delimiters to clearly delineate context sections, and implementing a separate classification model to detect injection attempts before routing to the main LLM.

Related Terms

Building production AI agents?

We design and implement deterministic AI agent systems for enterprise teams.

Start Assessment