Tool for HR, Hiring Managers, and the Leadership Team

What is Chunking in RAG?

What is Chunking in RAG?

In Retrieval-Augmented Generation (RAG), chunking means splitting large documents into smaller pieces called chunks before storing them in a vector database.

These chunks are later retrieved and sent to the LLM as context for answering questions.

Why Chunking is Needed

LLMs and embedding models work better with smaller, focused text segments.

If you store an entire document as one embedding:

  • Retrieval becomes less accurate

  • Important details may get diluted

  • Token limits become a problem

  • The model may retrieve irrelevant content

Chunking improves:

  • Retrieval accuracy

  • Semantic search quality

  • Context relevance

  • Response quality

Simple Flow in RAG

  1. Load documents

  2. Split documents into chunks

  3. Generate embeddings for each chunk

  4. Store embeddings in vector DB

  5. User asks a question

  6. Retrieve relevant chunks

  7. Send chunks + question to LLM

  8. Generate final answer

Example

Original Document

Artificial Intelligence is transforming healthcare.
Machine learning helps in disease prediction.
Deep learning is widely used in medical imaging.

After Chunking

Chunk 1
Artificial Intelligence is transforming healthcare.
Chunk 2
Machine learning helps in disease prediction.
Chunk 3
Deep learning is widely used in medical imaging.

Now if the user asks:

“How is deep learning used in healthcare?”

The retriever can directly fetch Chunk 3.

Types of Chunking

1. Fixed-Size Chunking

Splits text based on character/token count.

Example:

  • 500 tokens per chunk

  • 50-token overlap

Advantages
  • Simple

  • Fast

Disadvantages
  • May break sentences or meaning

2. Recursive Chunking

Tries to split intelligently:

  • Paragraphs

  • Sentences

  • Words

Commonly used in frameworks like:

  • LangChain

  • LlamaIndex

Advantage

Maintains semantic meaning better.

3. Semantic Chunking

Splits based on meaning instead of size.

Example:

  • Different topics become separate chunks

Advantage

Higher retrieval quality

Disadvantage

More computationally expensive

4. Sliding Window / Overlapping Chunking

Chunks overlap slightly.

Example:

Chunk 1: lines 1–5
Chunk 2: lines 4–8
Why overlap helps

Prevents losing context between chunks.

Very common in production RAG systems.

Important Interview Concept: Chunk Size

Chunk size is a critical tuning parameter.

Too Small:

  • Loses context

  • More retrieval calls needed

Too Large:

  • Irrelevant information included

  • Poor embedding quality

Typical chunk sizes:

  • 200–1000 tokens

Common overlap:

  • 10–20%

Interview Question:

“What happens if chunking is bad?”

Bad chunking causes:

  • Incorrect retrieval

  • Hallucinations

  • Missing context

  • Lower answer quality

Even a strong LLM performs poorly if retrieval quality is poor.

Chunking Strategy Depends On Data Type

Data Type Best Chunking Style
PDFs Recursive chunking
FAQs Small semantic chunks
Code Function/class-level chunks
Legal docs Paragraph-based chunks
Chat logs Conversation turns

Real-World Example

Suppose a company has a 200-page HR policy PDF.

Without chunking:

  • Entire PDF embedding is useless for retrieval.

With chunking:

  • Each policy section becomes searchable:

    • Leave policy

    • Insurance

    • Work-from-home rules

So user queries retrieve only relevant sections.

Common Interview Questions

1. Why is overlap used in chunking?

To preserve context between neighboring chunks.

2. Which chunking strategy is best?

Depends on the data and use case:

  • Fixed-size → simple

  • Recursive → most common

  • Semantic → best quality

3. What is an ideal chunk size?

No universal answer. Usually:

  • 300–800 tokens works well.

4. Can chunking affect hallucination?

Yes. Poor chunking can retrieve irrelevant or incomplete information, increasing hallucinations.

Short Interview Answer

“Chunking in RAG is the process of splitting large documents into smaller meaningful pieces before generating embeddings. It improves retrieval accuracy, context relevance, and overall response quality. Common approaches include fixed-size, recursive, semantic, and overlapping chunking.”

Pro Interview Tip

If asked:

“What is more important in RAG: LLM or retrieval quality?”

A strong answer is:

“Retrieval quality is extremely important. Even a powerful LLM cannot generate accurate answers if the retrieved chunks are poor. Chunking plays a major role in retrieval quality.”