Definition
Chunking is the document preprocessing step that divides large source materials into smaller text segments before embedding and indexing. Each chunk is embedded independently and stored as a separate vector in the retrieval database. When a query arrives, the system retrieves the most relevant chunks rather than entire documents—allowing precise, targeted context injection into the LLM's working memory rather than flooding the context window with irrelevant content.
Engineering Context
Chunking strategy significantly impacts RAG quality. Too-small chunks lose context; too-large chunks dilute relevance and consume context budget. Best practice: 512 tokens per chunk with 50-token overlap, splitting on semantic boundaries (paragraphs, sections) rather than arbitrary token counts. Hierarchical chunking (indexing both sentences and paragraphs) improves retrieval precision. Libraries like LangChain and LlamaIndex provide chunking utilities with configurable splitters. For structured documents (PDFs, HTML), use document-aware splitters that respect headings and sections. Store chunk metadata (source document, page, section) alongside embeddings to enable source attribution in agent outputs.
Related Terms
Building production AI agents?
We design and implement deterministic AI agent systems for enterprise teams.
Start Assessment