What is Chunking in RAG?

In Retrieval-Augmented Generation (RAG), chunking means splitting large documents into smaller pieces called chunks before storing them in a vector database.

These chunks are later retrieved and sent to the LLM as context for answering questions.

Why Chunking is Needed

LLMs and embedding models work better with smaller, focused text segments.

If you store an entire document as one embedding:

Retrieval becomes less accurate
Important details may get diluted
Token limits become a problem
The model may retrieve irrelevant content

Chunking improves:

Retrieval accuracy
Semantic search quality
Context relevance
Response quality

Simple Flow in RAG

Load documents
Split documents into chunks
Generate embeddings for each chunk
Store embeddings in vector DB
User asks a question
Retrieve relevant chunks
Send chunks + question to LLM
Generate final answer

Example

Original Document

Artificial Intelligence is transforming healthcare.
Machine learning helps in disease prediction.
Deep learning is widely used in medical imaging.

After Chunking

Chunk 1

Artificial Intelligence is transforming healthcare.

Chunk 2

Machine learning helps in disease prediction.

Chunk 3

Deep learning is widely used in medical imaging.

Now if the user asks:

“How is deep learning used in healthcare?”

The retriever can directly fetch Chunk 3.

Types of Chunking

1. Fixed-Size Chunking

Splits text based on character/token count.

Example:

500 tokens per chunk
50-token overlap

Advantages

Simple
Fast

Disadvantages

May break sentences or meaning

2. Recursive Chunking

Tries to split intelligently:

Paragraphs
Sentences
Words

Commonly used in frameworks like:

LangChain
LlamaIndex

Advantage

Maintains semantic meaning better.

3. Semantic Chunking

Splits based on meaning instead of size.

Example:

Different topics become separate chunks

Advantage

Higher retrieval quality

Disadvantage

More computationally expensive

4. Sliding Window / Overlapping Chunking

Chunks overlap slightly.

Example:

Chunk 1: lines 1–5
Chunk 2: lines 4–8

Why overlap helps

Prevents losing context between chunks.

Very common in production RAG systems.

Important Interview Concept: Chunk Size

Chunk size is a critical tuning parameter.

Too Small:

Loses context
More retrieval calls needed

Too Large:

Irrelevant information included
Poor embedding quality

Typical chunk sizes:

200–1000 tokens

Common overlap:

10–20%

Interview Question:

“What happens if chunking is bad?”

Bad chunking causes:

Incorrect retrieval
Hallucinations
Missing context
Lower answer quality

Even a strong LLM performs poorly if retrieval quality is poor.

Chunking Strategy Depends On Data Type

Data Type	Best Chunking Style
PDFs	Recursive chunking
FAQs	Small semantic chunks
Code	Function/class-level chunks
Legal docs	Paragraph-based chunks
Chat logs	Conversation turns

Real-World Example

Suppose a company has a 200-page HR policy PDF.

Without chunking:

Entire PDF embedding is useless for retrieval.

With chunking:

Each policy section becomes searchable:
- Leave policy
- Insurance
- Work-from-home rules

So user queries retrieve only relevant sections.

Common Interview Questions

1. Why is overlap used in chunking?

To preserve context between neighboring chunks.

2. Which chunking strategy is best?

Depends on the data and use case:

Fixed-size → simple
Recursive → most common
Semantic → best quality

3. What is an ideal chunk size?

No universal answer. Usually:

300–800 tokens works well.

4. Can chunking affect hallucination?

Yes. Poor chunking can retrieve irrelevant or incomplete information, increasing hallucinations.

Short Interview Answer

“Chunking in RAG is the process of splitting large documents into smaller meaningful pieces before generating embeddings. It improves retrieval accuracy, context relevance, and overall response quality. Common approaches include fixed-size, recursive, semantic, and overlapping chunking.”

Pro Interview Tip

If asked:

“What is more important in RAG: LLM or retrieval quality?”

A strong answer is:

“Retrieval quality is extremely important. Even a powerful LLM cannot generate accurate answers if the retrieved chunks are poor. Chunking plays a major role in retrieval quality.”