What are the common chunking strategies?

In RAG (Retrieval-Augmented Generation), chunking means splitting large documents into smaller pieces so they can be embedded, stored in a vector database, and retrieved efficiently.

In interviews, you’re expected to explain why chunking matters and what strategies exist, not just list them.

Why chunking is important

Without chunking:

Embeddings become too broad (loss of meaning)
Retrieval returns irrelevant or noisy context
LLM context window is wasted

Good chunking improves:

Retrieval accuracy
Context relevance
Latency and cost

Common Chunking Strategies

1. Fixed-size (Naive) Chunking

You split text into chunks of fixed token/character size (e.g., 200, 500 tokens).

Example:

Chunk 1: 0–500 tokens
Chunk 2: 500–1000 tokens

Pros:

Simple to implement
Fast

Cons:

Breaks sentences/semantics
Can cut important context in half

Interview line:

“It’s easy but often semantically poor because it ignores document structure.”

2. Overlapping Chunking

Same as fixed-size, but with overlap between chunks.

Example:

Chunk 1: tokens 0–500
Chunk 2: tokens 400–900

Why overlap?

Preserves context across boundaries.

Pros:

Reduces context loss
Improves retrieval continuity

Cons:

More storage cost
Duplicate information

3. Sentence-based Chunking

Splits based on sentence boundaries.

Example:

Each chunk contains N sentences

Pros:

Better semantic integrity than fixed chunks
Easy to implement with NLP tools

Cons:

Sentences may still lack full context

4. Paragraph-based Chunking

Uses natural paragraph breaks.

Pros:

Highly semantic
Works well for articles, blogs, documentation

Cons:

Paragraphs can be too large or too small

Common in real RAG systems for documentation.

5. Recursive Chunking (Hierarchical Splitting)

This is widely used in production systems.

It works like:

Try splitting by large separators (sections, headings)
If still too big → split by paragraphs
If still too big → split by sentences
Finally → token-based split

Pros:

Maintains structure + semantic meaning
Adaptive to different document types

Cons:

Slightly more complex

Interview answer highlight:

“Recursive chunking is the most robust general-purpose approach used in frameworks like LangChain.”

6. Semantic Chunking

Chunks are created based on meaning similarity using embeddings.

How it works:

Compute sentence embeddings
Group sentences with similar meaning
Break when similarity drops

Pros:

Best semantic coherence
Ideal for long-form content

Cons:

Expensive (needs embeddings during preprocessing)
Slower pipeline

7. Structure-aware Chunking (Document-based)

Uses document structure like:

Headings (H1, H2, H3)
Sections
Markdown / HTML structure
Code blocks

Example:

# Introduction → Chunk
# Methods → Chunk
# Conclusion → Chunk

Pros:

Very high quality retrieval
Preserves logical flow

Cons:

Requires structured documents

8. Token-aware Chunking (LLM-safe chunking)

Ensures chunks stay within token limits of embedding model / LLM.

Pros:

Prevents overflow errors
Practical for production

Cons:

Still needs semantic strategy on top

How to answer in interviews

A strong answer:

“Chunking strategies in RAG include fixed-size and overlapping chunking for simplicity, sentence and paragraph-based chunking for better semantic preservation, and more advanced methods like recursive and semantic chunking for production-grade systems. In real-world applications, recursive and structure-aware chunking are preferred because they balance context preservation and retrieval accuracy.”

Bonus: Real-world best practice

Most production RAG systems use:

Recursive + Overlapping + Structure-aware hybrid approach

Because:

Maintains semantic integrity
Handles different document formats
Improves retrieval precision