Systems Library / AI Capabilities / How to Optimize RAG Retrieval for Accuracy
AI Capabilities rag knowledge

How to Optimize RAG Retrieval for Accuracy

Improve RAG answer accuracy with better chunking and retrieval strategies.

Jay Banlasan

Jay Banlasan

The AI Systems Guy

When you optimize rag retrieval accuracy through better chunking strategies, the difference between a 60% accurate system and a 95% accurate one comes down to how you split and index your documents. I see teams deploy RAG with default settings and wonder why answers are wrong. The fix is almost always in the chunking, not the model.

These are the techniques I use to push RAG accuracy into production-grade territory.

What You Need Before Starting

Step 1: Fix Your Chunking Strategy

Default chunking splits text at arbitrary boundaries. Use semantic chunking instead:

from langchain.text_splitter import RecursiveCharacterTextSplitter

def smart_chunk(text, chunk_size=800, overlap=200):
    splitter = RecursiveCharacterTextSplitter(
        chunk_size=chunk_size,
        chunk_overlap=overlap,
        separators=["\n## ", "\n### ", "\n\n", "\n", ". ", " "],
        length_function=len
    )
    return splitter.split_text(text)

# Test different chunk sizes
for size in [400, 600, 800, 1200]:
    chunks = smart_chunk(sample_text, chunk_size=size)
    print(f"Chunk size {size}: {len(chunks)} chunks, avg {sum(len(c) for c in chunks)//len(chunks)} chars")

Step 2: Add Contextual Headers

Each chunk should carry context about where it came from:

def add_context_to_chunks(chunks, document_title, section_headers):
    enhanced = []
    current_section = ""

    for chunk in chunks:
        for header in section_headers:
            if header in chunk:
                current_section = header

        contextual_chunk = f"Document: {document_title}\nSection: {current_section}\n\n{chunk}"
        enhanced.append(contextual_chunk)

    return enhanced

Step 3: Use HyDE for Better Retrieval

Hypothetical Document Embeddings generate a hypothetical answer first, then search for matching content:

import anthropic

client = anthropic.Anthropic()

def hyde_search(question, collection, model):
    hypothetical = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=200,
        messages=[{
            "role": "user",
            "content": f"Write a short paragraph that would answer this question: {question}"
        }]
    )

    hyde_text = hypothetical.content[0].text
    hyde_embedding = model.encode(hyde_text).tolist()

    results = collection.query(query_embeddings=[hyde_embedding], n_results=5)
    return results

Step 4: Implement Re-Ranking

The initial retrieval gets candidates. Re-ranking picks the best ones:

def rerank_results(question, results, top_k=3):
    candidates = results["documents"][0]

    scored = []
    for i, doc in enumerate(candidates):
        response = client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=10,
            messages=[{
                "role": "user",
                "content": f"Rate 1-10 how relevant this passage is to the question.\nQuestion: {question}\nPassage: {doc[:500]}\nScore (number only):"
            }]
        )
        score = int(response.content[0].text.strip())
        scored.append((score, i, doc))

    scored.sort(reverse=True)
    return [s[2] for s in scored[:top_k]]

Step 5: Measure and Iterate

Build an evaluation pipeline:

def evaluate_rag(test_set, query_fn):
    results = {"correct": 0, "partial": 0, "wrong": 0, "total": len(test_set)}

    for test in test_set:
        answer = query_fn(test["question"])
        evaluation = client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=50,
            messages=[{
                "role": "user",
                "content": f"""Compare the generated answer to the expected answer.
Expected: {test['expected']}
Generated: {answer}
Rate: CORRECT (same meaning), PARTIAL (some right info), or WRONG.
Reply with one word only."""
            }]
        )
        rating = evaluation.content[0].text.strip().upper()
        if "CORRECT" in rating:
            results["correct"] += 1
        elif "PARTIAL" in rating:
            results["partial"] += 1
        else:
            results["wrong"] += 1

    results["accuracy"] = round(results["correct"] / results["total"] * 100, 1)
    return results

What to Build Next

Add query expansion. When a user asks a short question, generate 2-3 variations and search with all of them. Merge the results. This catches relevant chunks that a single query formulation would miss.

Related Reading

Want this system built for your business?

Get a free assessment. We will map every system your business needs and show you the ROI.

Get Your Free Assessment

Related Systems