How to Build a RAG System with Your Business Documents
Create a retrieval-augmented generation system for accurate answers from your data.
Jay Banlasan
The AI Systems Guy
When you build a rag system with business documents and ai, your team gets accurate answers from your own data instead of generic AI responses. I run these for businesses sitting on years of SOPs, meeting notes, contracts, and internal wikis that nobody can search effectively. RAG pulls the relevant chunks from your documents and feeds them to the AI as context.
The AI answers from your data, not from its training data. That is the difference between useful and hallucinated.
What You Need Before Starting
- Business documents in PDF, DOCX, TXT, or Markdown format
- Python 3.8+ with langchain, chromadb, and anthropic
- Enough disk space for your vector store
- An Anthropic API key
Step 1: Load and Chunk Your Documents
from langchain.document_loaders import DirectoryLoader, PyPDFLoader, TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
def load_documents(docs_path):
loaders = {
"**/*.pdf": PyPDFLoader,
"**/*.txt": TextLoader,
}
all_docs = []
for pattern, loader_cls in loaders.items():
loader = DirectoryLoader(docs_path, glob=pattern, loader_cls=loader_cls)
all_docs.extend(loader.load())
return all_docs
def chunk_documents(documents, chunk_size=1000, overlap=200):
splitter = RecursiveCharacterTextSplitter(
chunk_size=chunk_size,
chunk_overlap=overlap,
separators=["\n\n", "\n", ". ", " "]
)
return splitter.split_documents(documents)
Step 2: Create the Vector Store
from sentence_transformers import SentenceTransformer
import chromadb
model = SentenceTransformer("all-MiniLM-L6-v2")
chroma = chromadb.PersistentClient(path="./business_rag")
collection = chroma.get_or_create_collection("documents")
def index_chunks(chunks):
for i, chunk in enumerate(chunks):
embedding = model.encode(chunk.page_content).tolist()
collection.add(
ids=[f"chunk_{i}"],
embeddings=[embedding],
documents=[chunk.page_content],
metadatas=[{
"source": chunk.metadata.get("source", "unknown"),
"page": chunk.metadata.get("page", 0)
}]
)
print(f"Indexed {len(chunks)} chunks")
Step 3: Build the Query Pipeline
import anthropic
client = anthropic.Anthropic()
def query_rag(question, top_k=5):
query_embedding = model.encode(question).tolist()
results = collection.query(
query_embeddings=[query_embedding],
n_results=top_k
)
context_chunks = results["documents"][0]
sources = results["metadatas"][0]
context = "\n\n---\n\n".join(context_chunks)
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=500,
system="Answer the question using ONLY the provided context. If the context does not contain the answer, say 'I could not find this in the available documents.' Always cite which document the answer comes from.",
messages=[{
"role": "user",
"content": f"Context:\n{context}\n\nQuestion: {question}"
}]
)
return {
"answer": response.content[0].text,
"sources": [s["source"] for s in sources]
}
Step 4: Add the API Layer
from flask import Flask, request, jsonify
app = Flask(__name__)
@app.route("/api/ask", methods=["POST"])
def ask():
question = request.json["question"]
result = query_rag(question)
return jsonify(result)
@app.route("/api/index", methods=["POST"])
def reindex():
docs_path = request.json.get("path", "./documents")
documents = load_documents(docs_path)
chunks = chunk_documents(documents)
index_chunks(chunks)
return jsonify({"indexed": len(chunks)})
Step 5: Test and Validate
def validate_rag(test_questions):
results = []
for q in test_questions:
answer = query_rag(q["question"])
results.append({
"question": q["question"],
"expected": q["expected_answer"],
"actual": answer["answer"],
"sources": answer["sources"]
})
return results
test_set = [
{"question": "What is our refund policy?", "expected_answer": "30-day refund policy"},
{"question": "How do I request time off?", "expected_answer": "Submit through HR portal"},
]
What to Build Next
Add document versioning. When a document gets updated, re-index only that document and keep track of which version was used for each answer. This creates an audit trail for compliance.
Related Reading
- The Centralized Brain Concept - RAG as the foundation of business intelligence
- Data Flow Architecture for Non-Engineers - how document data flows into AI answers
- Build vs Buy: The AI Framework - when to build custom RAG vs using a platform
Want this system built for your business?
Get a free assessment. We will map every system your business needs and show you the ROI.
Get Your Free Assessment