How to Create a Conversational RAG Interface
Build a chat interface for natural conversations with your knowledge base.
Jay Banlasan
The AI Systems Guy
A conversational rag interface with chat turns your knowledge base into a conversation partner. I build these because single-question RAG feels like a search engine. Conversational RAG lets users drill down, ask follow-ups, and explore topics naturally. "Tell me about the vacation policy" followed by "What about rollover?" just works.
The trick is maintaining context across turns while still retrieving fresh content for each question.
What You Need Before Starting
- A working RAG system (see system 409)
- Python 3.8+ with anthropic, chromadb, and Flask
- Session management for conversation history
- A chat frontend
Step 1: Build the Conversation Manager
from collections import defaultdict
from datetime import datetime
class ConversationManager:
def __init__(self):
self.sessions = defaultdict(lambda: {"messages": [], "created_at": datetime.now()})
def add_message(self, session_id, role, content):
self.sessions[session_id]["messages"].append({"role": role, "content": content})
def get_history(self, session_id, max_turns=5):
messages = self.sessions[session_id]["messages"]
return messages[-(max_turns * 2):]
def get_context_query(self, session_id, new_question):
history = self.get_history(session_id, max_turns=3)
if not history:
return new_question
recent_context = " ".join([m["content"] for m in history[-4:]])
return f"{recent_context} {new_question}"
conversations = ConversationManager()
Step 2: Retrieve with Conversation Context
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("all-MiniLM-L6-v2")
def conversational_retrieve(session_id, question, collection, top_k=5):
context_query = conversations.get_context_query(session_id, question)
query_embedding = model.encode(context_query).tolist()
results = collection.query(query_embeddings=[query_embedding], n_results=top_k)
return results
Step 3: Generate Conversational Answers
import anthropic
client = anthropic.Anthropic()
def chat_with_knowledge(session_id, question, collection):
results = conversational_retrieve(session_id, question, collection)
context = "\n\n".join(results["documents"][0])
conversations.add_message(session_id, "user", question)
history = conversations.get_history(session_id)
system_prompt = f"""You are a knowledgeable assistant. Answer from the provided context.
Maintain conversation flow. Reference previous questions when relevant.
If the user says "what about X," connect it to the previous topic.
Context from knowledge base:
{context}"""
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=500,
system=system_prompt,
messages=history
)
reply = response.content[0].text
conversations.add_message(session_id, "assistant", reply)
return reply
Step 4: Build the Chat API
from flask import Flask, request, jsonify
import uuid
app = Flask(__name__)
@app.route("/api/chat/start", methods=["POST"])
def start_chat():
session_id = str(uuid.uuid4())
return jsonify({"session_id": session_id})
@app.route("/api/chat/message", methods=["POST"])
def send_message():
session_id = request.json["session_id"]
question = request.json["message"]
reply = chat_with_knowledge(session_id, question, collection)
return jsonify({"reply": reply, "session_id": session_id})
@app.route("/api/chat/history/<session_id>", methods=["GET"])
def get_history(session_id):
history = conversations.get_history(session_id, max_turns=50)
return jsonify({"messages": history})
Step 5: Add Suggested Follow-Ups
After each answer, suggest related questions:
def generate_followups(answer, question):
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=100,
messages=[{
"role": "user",
"content": f"Based on this Q&A, suggest 3 brief follow-up questions the user might ask next.\nQ: {question}\nA: {answer}\n\nList only the questions, one per line."
}]
)
return response.content[0].text.strip().split("\n")
What to Build Next
Add conversation summarization. When a session exceeds 10 turns, summarize the earlier turns into a condensed context. This prevents the conversation from exceeding token limits while preserving the discussion thread.
Related Reading
- The Centralized Brain Concept - conversations as the interface to your knowledge brain
- AI in Customer Service - conversational interfaces for customer-facing knowledge
- The Feedback Loop That Powers Everything - conversation patterns improving knowledge base coverage
Want this system built for your business?
Get a free assessment. We will map every system your business needs and show you the ROI.
Get Your Free Assessment