How to Optimize RAG Retrieval for Accuracy

Improve RAG answer accuracy with better chunking and retrieval strategies.

Jay Banlasan

The AI Systems Guy

When you optimize rag retrieval accuracy through better chunking strategies, the difference between a 60% accurate system and a 95% accurate one comes down to how you split and index your documents. I see teams deploy RAG with default settings and wonder why answers are wrong. The fix is almost always in the chunking, not the model.

These are the techniques I use to push RAG accuracy into production-grade territory.

What You Need Before Starting

A working RAG system with test questions and known answers
Python 3.8+ with langchain and sentence-transformers
Your document corpus indexed and queryable
A test suite of 20+ question-answer pairs

Step 1: Fix Your Chunking Strategy

Default chunking splits text at arbitrary boundaries. Use semantic chunking instead:

from langchain.text_splitter import RecursiveCharacterTextSplitter

def smart_chunk(text, chunk_size=800, overlap=200):
    splitter = RecursiveCharacterTextSplitter(
        chunk_size=chunk_size,
        chunk_overlap=overlap,
        separators=["\n## ", "\n### ", "\n\n", "\n", ". ", " "],
        length_function=len
    )
    return splitter.split_text(text)

# Test different chunk sizes
for size in [400, 600, 800, 1200]:
    chunks = smart_chunk(sample_text, chunk_size=size)
    print(f"Chunk size {size}: {len(chunks)} chunks, avg {sum(len(c) for c in chunks)//len(chunks)} chars")

Step 2: Add Contextual Headers

Each chunk should carry context about where it came from:

def add_context_to_chunks(chunks, document_title, section_headers):
    enhanced = []
    current_section = ""

    for chunk in chunks:
        for header in section_headers:
            if header in chunk:
                current_section = header

        contextual_chunk = f"Document: {document_title}\nSection: {current_section}\n\n{chunk}"
        enhanced.append(contextual_chunk)

    return enhanced

Step 3: Use HyDE for Better Retrieval

Hypothetical Document Embeddings generate a hypothetical answer first, then search for matching content:

import anthropic

client = anthropic.Anthropic()

def hyde_search(question, collection, model):
    hypothetical = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=200,
        messages=[{
            "role": "user",
            "content": f"Write a short paragraph that would answer this question: {question}"
        }]
    )

    hyde_text = hypothetical.content[0].text
    hyde_embedding = model.encode(hyde_text).tolist()

    results = collection.query(query_embeddings=[hyde_embedding], n_results=5)
    return results

Step 4: Implement Re-Ranking

The initial retrieval gets candidates. Re-ranking picks the best ones:

def rerank_results(question, results, top_k=3):
    candidates = results["documents"][0]

    scored = []
    for i, doc in enumerate(candidates):
        response = client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=10,
            messages=[{
                "role": "user",
                "content": f"Rate 1-10 how relevant this passage is to the question.\nQuestion: {question}\nPassage: {doc[:500]}\nScore (number only):"
            }]
        )
        score = int(response.content[0].text.strip())
        scored.append((score, i, doc))

    scored.sort(reverse=True)
    return [s[2] for s in scored[:top_k]]

Step 5: Measure and Iterate

Build an evaluation pipeline:

def evaluate_rag(test_set, query_fn):
    results = {"correct": 0, "partial": 0, "wrong": 0, "total": len(test_set)}

    for test in test_set:
        answer = query_fn(test["question"])
        evaluation = client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=50,
            messages=[{
                "role": "user",
                "content": f"""Compare the generated answer to the expected answer.
Expected: {test['expected']}
Generated: {answer}
Rate: CORRECT (same meaning), PARTIAL (some right info), or WRONG.
Reply with one word only."""
            }]
        )
        rating = evaluation.content[0].text.strip().upper()
        if "CORRECT" in rating:
            results["correct"] += 1
        elif "PARTIAL" in rating:
            results["partial"] += 1
        else:
            results["wrong"] += 1

    results["accuracy"] = round(results["correct"] / results["total"] * 100, 1)
    return results

What to Build Next

Add query expansion. When a user asks a short question, generate 2-3 variations and search with all of them. Merge the results. This catches relevant chunks that a single query formulation would miss.

How to Optimize RAG Retrieval for Accuracy

What You Need Before Starting

Step 1: Fix Your Chunking Strategy

Step 2: Add Contextual Headers

Step 3: Use HyDE for Better Retrieval

Step 4: Implement Re-Ranking

Step 5: Measure and Iterate

What to Build Next

Related Reading

Related Systems

How to Build a RAG System with Your Business Documents

How to Build a Multi-Source RAG System

How to Automate RAG Knowledge Base Maintenance