Systems Library / AI Capabilities / How to Create a Conversational RAG Interface
AI Capabilities rag knowledge

How to Create a Conversational RAG Interface

Build a chat interface for natural conversations with your knowledge base.

Jay Banlasan

Jay Banlasan

The AI Systems Guy

A conversational rag interface with chat turns your knowledge base into a conversation partner. I build these because single-question RAG feels like a search engine. Conversational RAG lets users drill down, ask follow-ups, and explore topics naturally. "Tell me about the vacation policy" followed by "What about rollover?" just works.

The trick is maintaining context across turns while still retrieving fresh content for each question.

What You Need Before Starting

Step 1: Build the Conversation Manager

from collections import defaultdict
from datetime import datetime

class ConversationManager:
    def __init__(self):
        self.sessions = defaultdict(lambda: {"messages": [], "created_at": datetime.now()})

    def add_message(self, session_id, role, content):
        self.sessions[session_id]["messages"].append({"role": role, "content": content})

    def get_history(self, session_id, max_turns=5):
        messages = self.sessions[session_id]["messages"]
        return messages[-(max_turns * 2):]

    def get_context_query(self, session_id, new_question):
        history = self.get_history(session_id, max_turns=3)
        if not history:
            return new_question
        recent_context = " ".join([m["content"] for m in history[-4:]])
        return f"{recent_context} {new_question}"

conversations = ConversationManager()

Step 2: Retrieve with Conversation Context

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("all-MiniLM-L6-v2")

def conversational_retrieve(session_id, question, collection, top_k=5):
    context_query = conversations.get_context_query(session_id, question)
    query_embedding = model.encode(context_query).tolist()
    results = collection.query(query_embeddings=[query_embedding], n_results=top_k)
    return results

Step 3: Generate Conversational Answers

import anthropic

client = anthropic.Anthropic()

def chat_with_knowledge(session_id, question, collection):
    results = conversational_retrieve(session_id, question, collection)
    context = "\n\n".join(results["documents"][0])

    conversations.add_message(session_id, "user", question)
    history = conversations.get_history(session_id)

    system_prompt = f"""You are a knowledgeable assistant. Answer from the provided context.
Maintain conversation flow. Reference previous questions when relevant.
If the user says "what about X," connect it to the previous topic.

Context from knowledge base:
{context}"""

    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=500,
        system=system_prompt,
        messages=history
    )

    reply = response.content[0].text
    conversations.add_message(session_id, "assistant", reply)
    return reply

Step 4: Build the Chat API

from flask import Flask, request, jsonify
import uuid

app = Flask(__name__)

@app.route("/api/chat/start", methods=["POST"])
def start_chat():
    session_id = str(uuid.uuid4())
    return jsonify({"session_id": session_id})

@app.route("/api/chat/message", methods=["POST"])
def send_message():
    session_id = request.json["session_id"]
    question = request.json["message"]

    reply = chat_with_knowledge(session_id, question, collection)
    return jsonify({"reply": reply, "session_id": session_id})

@app.route("/api/chat/history/<session_id>", methods=["GET"])
def get_history(session_id):
    history = conversations.get_history(session_id, max_turns=50)
    return jsonify({"messages": history})

Step 5: Add Suggested Follow-Ups

After each answer, suggest related questions:

def generate_followups(answer, question):
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=100,
        messages=[{
            "role": "user",
            "content": f"Based on this Q&A, suggest 3 brief follow-up questions the user might ask next.\nQ: {question}\nA: {answer}\n\nList only the questions, one per line."
        }]
    )
    return response.content[0].text.strip().split("\n")

What to Build Next

Add conversation summarization. When a session exceeds 10 turns, summarize the earlier turns into a condensed context. This prevents the conversation from exceeding token limits while preserving the discussion thread.

Related Reading

Want this system built for your business?

Get a free assessment. We will map every system your business needs and show you the ROI.

Get Your Free Assessment

Related Systems