Systems Library / AI Capabilities / How to Build a RAG System with Your Business Documents

AI Capabilities rag knowledge

How to Build a RAG System with Your Business Documents

Create a retrieval-augmented generation system for accurate answers from your data.

Jay Banlasan

The AI Systems Guy

When you build a rag system with business documents and ai, your team gets accurate answers from your own data instead of generic AI responses. I run these for businesses sitting on years of SOPs, meeting notes, contracts, and internal wikis that nobody can search effectively. RAG pulls the relevant chunks from your documents and feeds them to the AI as context.

The AI answers from your data, not from its training data. That is the difference between useful and hallucinated.

What You Need Before Starting

Business documents in PDF, DOCX, TXT, or Markdown format
Python 3.8+ with langchain, chromadb, and anthropic
Enough disk space for your vector store
An Anthropic API key

Step 1: Load and Chunk Your Documents

from langchain.document_loaders import DirectoryLoader, PyPDFLoader, TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

def load_documents(docs_path):
    loaders = {
        "**/*.pdf": PyPDFLoader,
        "**/*.txt": TextLoader,
    }
    all_docs = []
    for pattern, loader_cls in loaders.items():
        loader = DirectoryLoader(docs_path, glob=pattern, loader_cls=loader_cls)
        all_docs.extend(loader.load())
    return all_docs

def chunk_documents(documents, chunk_size=1000, overlap=200):
    splitter = RecursiveCharacterTextSplitter(
        chunk_size=chunk_size,
        chunk_overlap=overlap,
        separators=["\n\n", "\n", ". ", " "]
    )
    return splitter.split_documents(documents)

Step 2: Create the Vector Store

from sentence_transformers import SentenceTransformer
import chromadb

model = SentenceTransformer("all-MiniLM-L6-v2")
chroma = chromadb.PersistentClient(path="./business_rag")
collection = chroma.get_or_create_collection("documents")

def index_chunks(chunks):
    for i, chunk in enumerate(chunks):
        embedding = model.encode(chunk.page_content).tolist()
        collection.add(
            ids=[f"chunk_{i}"],
            embeddings=[embedding],
            documents=[chunk.page_content],
            metadatas=[{
                "source": chunk.metadata.get("source", "unknown"),
                "page": chunk.metadata.get("page", 0)
            }]
        )
    print(f"Indexed {len(chunks)} chunks")

Step 3: Build the Query Pipeline

import anthropic

client = anthropic.Anthropic()

def query_rag(question, top_k=5):
    query_embedding = model.encode(question).tolist()
    results = collection.query(
        query_embeddings=[query_embedding],
        n_results=top_k
    )

    context_chunks = results["documents"][0]
    sources = results["metadatas"][0]
    context = "\n\n---\n\n".join(context_chunks)

    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=500,
        system="Answer the question using ONLY the provided context. If the context does not contain the answer, say 'I could not find this in the available documents.' Always cite which document the answer comes from.",
        messages=[{
            "role": "user",
            "content": f"Context:\n{context}\n\nQuestion: {question}"
        }]
    )

    return {
        "answer": response.content[0].text,
        "sources": [s["source"] for s in sources]
    }

Step 4: Add the API Layer

from flask import Flask, request, jsonify

app = Flask(__name__)

@app.route("/api/ask", methods=["POST"])
def ask():
    question = request.json["question"]
    result = query_rag(question)
    return jsonify(result)

@app.route("/api/index", methods=["POST"])
def reindex():
    docs_path = request.json.get("path", "./documents")
    documents = load_documents(docs_path)
    chunks = chunk_documents(documents)
    index_chunks(chunks)
    return jsonify({"indexed": len(chunks)})

Step 5: Test and Validate

def validate_rag(test_questions):
    results = []
    for q in test_questions:
        answer = query_rag(q["question"])
        results.append({
            "question": q["question"],
            "expected": q["expected_answer"],
            "actual": answer["answer"],
            "sources": answer["sources"]
        })
    return results

test_set = [
    {"question": "What is our refund policy?", "expected_answer": "30-day refund policy"},
    {"question": "How do I request time off?", "expected_answer": "Submit through HR portal"},
]

What to Build Next

Add document versioning. When a document gets updated, re-index only that document and keep track of which version was used for each answer. This creates an audit trail for compliance.

How to Build a RAG System with Your Business Documents

What You Need Before Starting

Step 1: Load and Chunk Your Documents

Step 2: Create the Vector Store

Step 3: Build the Query Pipeline

Step 4: Add the API Layer

Step 5: Test and Validate

What to Build Next

Related Reading

Related Systems

How to Set Up a Vector Database for AI Search

How to Optimize RAG Retrieval for Accuracy

How to Automate RAG Knowledge Base Maintenance