How to Optimize RAG Retrieval for Accuracy
Improve RAG answer accuracy with better chunking and retrieval strategies.
Jay Banlasan
The AI Systems Guy
When you optimize rag retrieval accuracy through better chunking strategies, the difference between a 60% accurate system and a 95% accurate one comes down to how you split and index your documents. I see teams deploy RAG with default settings and wonder why answers are wrong. The fix is almost always in the chunking, not the model.
These are the techniques I use to push RAG accuracy into production-grade territory.
What You Need Before Starting
- A working RAG system with test questions and known answers
- Python 3.8+ with langchain and sentence-transformers
- Your document corpus indexed and queryable
- A test suite of 20+ question-answer pairs
Step 1: Fix Your Chunking Strategy
Default chunking splits text at arbitrary boundaries. Use semantic chunking instead:
from langchain.text_splitter import RecursiveCharacterTextSplitter
def smart_chunk(text, chunk_size=800, overlap=200):
splitter = RecursiveCharacterTextSplitter(
chunk_size=chunk_size,
chunk_overlap=overlap,
separators=["\n## ", "\n### ", "\n\n", "\n", ". ", " "],
length_function=len
)
return splitter.split_text(text)
# Test different chunk sizes
for size in [400, 600, 800, 1200]:
chunks = smart_chunk(sample_text, chunk_size=size)
print(f"Chunk size {size}: {len(chunks)} chunks, avg {sum(len(c) for c in chunks)//len(chunks)} chars")
Step 2: Add Contextual Headers
Each chunk should carry context about where it came from:
def add_context_to_chunks(chunks, document_title, section_headers):
enhanced = []
current_section = ""
for chunk in chunks:
for header in section_headers:
if header in chunk:
current_section = header
contextual_chunk = f"Document: {document_title}\nSection: {current_section}\n\n{chunk}"
enhanced.append(contextual_chunk)
return enhanced
Step 3: Use HyDE for Better Retrieval
Hypothetical Document Embeddings generate a hypothetical answer first, then search for matching content:
import anthropic
client = anthropic.Anthropic()
def hyde_search(question, collection, model):
hypothetical = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=200,
messages=[{
"role": "user",
"content": f"Write a short paragraph that would answer this question: {question}"
}]
)
hyde_text = hypothetical.content[0].text
hyde_embedding = model.encode(hyde_text).tolist()
results = collection.query(query_embeddings=[hyde_embedding], n_results=5)
return results
Step 4: Implement Re-Ranking
The initial retrieval gets candidates. Re-ranking picks the best ones:
def rerank_results(question, results, top_k=3):
candidates = results["documents"][0]
scored = []
for i, doc in enumerate(candidates):
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=10,
messages=[{
"role": "user",
"content": f"Rate 1-10 how relevant this passage is to the question.\nQuestion: {question}\nPassage: {doc[:500]}\nScore (number only):"
}]
)
score = int(response.content[0].text.strip())
scored.append((score, i, doc))
scored.sort(reverse=True)
return [s[2] for s in scored[:top_k]]
Step 5: Measure and Iterate
Build an evaluation pipeline:
def evaluate_rag(test_set, query_fn):
results = {"correct": 0, "partial": 0, "wrong": 0, "total": len(test_set)}
for test in test_set:
answer = query_fn(test["question"])
evaluation = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=50,
messages=[{
"role": "user",
"content": f"""Compare the generated answer to the expected answer.
Expected: {test['expected']}
Generated: {answer}
Rate: CORRECT (same meaning), PARTIAL (some right info), or WRONG.
Reply with one word only."""
}]
)
rating = evaluation.content[0].text.strip().upper()
if "CORRECT" in rating:
results["correct"] += 1
elif "PARTIAL" in rating:
results["partial"] += 1
else:
results["wrong"] += 1
results["accuracy"] = round(results["correct"] / results["total"] * 100, 1)
return results
What to Build Next
Add query expansion. When a user asks a short question, generate 2-3 variations and search with all of them. Merge the results. This catches relevant chunks that a single query formulation would miss.
Related Reading
- The Trust Framework for AI Decisions - accuracy as the foundation of AI trust
- The Measurement Framework That Actually Works - measuring RAG quality systematically
- Why Simplicity Beats Complexity in AI - simple optimizations that deliver the biggest accuracy gains
Want this system built for your business?
Get a free assessment. We will map every system your business needs and show you the ROI.
Get Your Free Assessment