In RAG (Retrieval-Augmented Generation), chunking means splitting large documents into smaller pieces so they can be embedded, stored in a vector database, and retrieved efficiently.
In interviews, you’re expected to explain why chunking matters and what strategies exist, not just list them.
Why chunking is important
Without chunking:
-
Embeddings become too broad (loss of meaning)
-
Retrieval returns irrelevant or noisy context
-
LLM context window is wasted
Good chunking improves:
-
Retrieval accuracy
-
Context relevance
-
Latency and cost
Common Chunking Strategies
1. Fixed-size (Naive) Chunking
You split text into chunks of fixed token/character size (e.g., 200, 500 tokens).
Example:
Chunk 1: 0–500 tokens
Chunk 2: 500–1000 tokens
Pros:
-
Simple to implement
-
Fast
Cons:
-
Breaks sentences/semantics
-
Can cut important context in half
Interview line:
“It’s easy but often semantically poor because it ignores document structure.”
2. Overlapping Chunking
Same as fixed-size, but with overlap between chunks.
Example:
-
Chunk 1: tokens 0–500
-
Chunk 2: tokens 400–900
Why overlap?
Preserves context across boundaries.
Pros:
-
Reduces context loss
-
Improves retrieval continuity
Cons:
-
More storage cost
-
Duplicate information
3. Sentence-based Chunking
Splits based on sentence boundaries.
Example:
-
Each chunk contains N sentences
Pros:
-
Better semantic integrity than fixed chunks
-
Easy to implement with NLP tools
Cons:
-
Sentences may still lack full context
4. Paragraph-based Chunking
Uses natural paragraph breaks.
Pros:
-
Highly semantic
-
Works well for articles, blogs, documentation
Cons:
-
Paragraphs can be too large or too small
Common in real RAG systems for documentation.
5. Recursive Chunking (Hierarchical Splitting)
This is widely used in production systems.
It works like:
-
Try splitting by large separators (sections, headings)
-
If still too big → split by paragraphs
-
If still too big → split by sentences
-
Finally → token-based split
Pros:
-
Maintains structure + semantic meaning
-
Adaptive to different document types
Cons:
-
Slightly more complex
Interview answer highlight:
“Recursive chunking is the most robust general-purpose approach used in frameworks like LangChain.”
6. Semantic Chunking
Chunks are created based on meaning similarity using embeddings.
How it works:
-
Compute sentence embeddings
-
Group sentences with similar meaning
-
Break when similarity drops
Pros:
-
Best semantic coherence
-
Ideal for long-form content
Cons:
-
Expensive (needs embeddings during preprocessing)
-
Slower pipeline
7. Structure-aware Chunking (Document-based)
Uses document structure like:
-
Headings (H1, H2, H3)
-
Sections
-
Markdown / HTML structure
-
Code blocks
Example:
# Introduction → Chunk
# Methods → Chunk
# Conclusion → Chunk
Pros:
-
Very high quality retrieval
-
Preserves logical flow
Cons:
-
Requires structured documents
8. Token-aware Chunking (LLM-safe chunking)
Ensures chunks stay within token limits of embedding model / LLM.
Pros:
-
Prevents overflow errors
-
Practical for production
Cons:
-
Still needs semantic strategy on top
How to answer in interviews
A strong answer:
“Chunking strategies in RAG include fixed-size and overlapping chunking for simplicity, sentence and paragraph-based chunking for better semantic preservation, and more advanced methods like recursive and semantic chunking for production-grade systems. In real-world applications, recursive and structure-aware chunking are preferred because they balance context preservation and retrieval accuracy.”
Bonus: Real-world best practice
Most production RAG systems use:
Recursive + Overlapping + Structure-aware hybrid approach
Because:
-
Maintains semantic integrity
-
Handles different document formats
-
Improves retrieval precision
