How to Build a Citation System for RAG Answers
Show source citations for every AI answer to build user trust.
Jay Banlasan
The AI Systems Guy
A rag citation system with source attribution for every answer builds the trust that makes AI answers usable for real decisions. I build these because "trust me" is not an acceptable answer from AI. Users need to verify claims. Citations link every statement to the specific document, page, and paragraph it came from.
This turns RAG from "magic black box" into "verifiable reference tool."
What You Need Before Starting
- A working RAG system with metadata-rich chunks
- Python 3.8+ with anthropic
- Chunk-level source tracking (document name, page, section)
- A frontend that can render inline citations
Step 1: Structure Citations in the Prompt
import anthropic
import json
client = anthropic.Anthropic()
CITATION_PROMPT = """Answer the question using the numbered sources below.
For EVERY factual claim in your answer, add a citation like [1] or [2].
If multiple sources support a claim, cite all of them like [1][3].
If no source supports a claim, do not make that claim.
After your answer, list which sources you cited and why.
Sources:
{sources}"""
def query_with_citations(question, collection):
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("all-MiniLM-L6-v2")
query_embedding = model.encode(question).tolist()
results = collection.query(query_embeddings=[query_embedding], n_results=5)
sources = []
for i in range(len(results["ids"][0])):
sources.append({
"index": i + 1,
"text": results["documents"][0][i],
"metadata": results["metadatas"][0][i]
})
source_text = "\n\n".join([f"[{s['index']}] ({s['metadata']['source']}, p.{s['metadata'].get('page', 'N/A')})\n{s['text']}" for s in sources])
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=600,
system=CITATION_PROMPT.format(sources=source_text),
messages=[{"role": "user", "content": question}]
)
return {"answer": response.content[0].text, "sources": sources}
Step 2: Parse and Validate Citations
import re
def extract_citations(answer_text):
citation_pattern = r'\[(\d+)\]'
cited_indices = set(int(m) for m in re.findall(citation_pattern, answer_text))
return cited_indices
def validate_citations(answer, sources):
cited = extract_citations(answer)
available = set(s["index"] for s in sources)
invalid = cited - available
uncited_sources = available - cited
return {
"valid_citations": cited & available,
"invalid_citations": invalid,
"uncited_sources": uncited_sources,
"all_valid": len(invalid) == 0
}
Step 3: Build Clickable Citation Links
def format_citations_for_frontend(answer, sources):
citation_data = {}
for source in sources:
citation_data[source["index"]] = {
"document": source["metadata"]["source"],
"page": source["metadata"].get("page", "N/A"),
"excerpt": source["text"][:200],
"url": source["metadata"].get("url", "")
}
def replace_citation(match):
idx = int(match.group(1))
if idx in citation_data:
data = citation_data[idx]
return f'<cite data-source="{idx}" data-doc="{data["document"]}" data-page="{data["page"]}">[{idx}]</cite>'
return match.group(0)
formatted = re.sub(r'\[(\d+)\]', replace_citation, answer)
return {"html": formatted, "citation_data": citation_data}
Step 4: Track Citation Quality
def log_citation_metrics(answer, sources):
cited = extract_citations(answer)
validation = validate_citations(answer, sources)
conn = sqlite3.connect("citations.db")
conn.execute("""
INSERT INTO citation_log (total_sources, cited_count, valid_count, logged_at)
VALUES (?, ?, ?, datetime('now'))
""", (len(sources), len(cited), len(validation["valid_citations"])))
conn.commit()
def get_citation_report(days=30):
conn = sqlite3.connect("citations.db")
start = f"-{days} days"
avg_cited = conn.execute(
"SELECT AVG(cited_count), AVG(valid_count) FROM citation_log WHERE logged_at > datetime('now', ?)", (start,)
).fetchone()
return {"avg_citations_per_answer": round(avg_cited[0] or 0, 1), "avg_valid": round(avg_cited[1] or 0, 1)}
Step 5: Handle Citation Conflicts
When sources disagree, surface the conflict:
CONFLICT_PROMPT = """If any of the sources below contradict each other, note the discrepancy.
State what each source says and let the user decide which applies to their situation.
Do not pick a side unless one source is clearly more recent."""
What to Build Next
Add citation feedback. Let users click a citation and mark it as "relevant" or "not relevant." That feedback improves retrieval over time and helps you identify which documents need updating.
Related Reading
- The Trust Framework for AI Decisions - citations as the foundation of AI trustworthiness
- AI-Powered Reporting That Actually Gets Read - cited reports are reports people trust
- The Measurement Framework That Actually Works - measuring citation accuracy and relevance
Want this system built for your business?
Get a free assessment. We will map every system your business needs and show you the ROI.
Get Your Free Assessment