How to Build an AI-Powered Knowledge Base
Create a searchable knowledge base that uses AI to find answers.
Jay Banlasan
The AI Systems Guy
An ai powered knowledge base for customer self-service replaces the useless search bars most help centers ship with. I build these because traditional keyword search fails when customers do not use the exact terminology your docs use. AI search understands intent, so "I cannot get in" matches your article about password resets.
This setup uses semantic search with embeddings. Customers type their question in plain language and get the right article every time.
What You Need Before Starting
- Your help articles, docs, or FAQ content
- Python 3.8+ with sentence-transformers and ChromaDB
- Flask for the API layer
- A frontend search interface
Step 1: Index Your Content
import chromadb
from sentence_transformers import SentenceTransformer
import json
model = SentenceTransformer("all-MiniLM-L6-v2")
chroma = chromadb.PersistentClient(path="./kb_store")
collection = chroma.get_or_create_collection("knowledge_base")
def index_articles():
with open("articles.json") as f:
articles = json.load(f)
for article in articles:
text = f"{article['title']} {article['content']}"
embedding = model.encode(text).tolist()
collection.add(
ids=[article["id"]],
embeddings=[embedding],
documents=[text],
metadatas=[{
"title": article["title"],
"url": article["url"],
"category": article["category"],
"updated_at": article["updated_at"]
}]
)
print(f"Indexed {len(articles)} articles")
Step 2: Build the Search API
from flask import Flask, request, jsonify
app = Flask(__name__)
@app.route("/api/search", methods=["POST"])
def search():
query = request.json["query"]
top_k = request.json.get("top_k", 5)
query_embedding = model.encode(query).tolist()
results = collection.query(
query_embeddings=[query_embedding],
n_results=top_k
)
articles = []
for i in range(len(results["ids"][0])):
articles.append({
"id": results["ids"][0][i],
"title": results["metadatas"][0][i]["title"],
"url": results["metadatas"][0][i]["url"],
"relevance": round(1 - results["distances"][0][i], 3),
"snippet": results["documents"][0][i][:200]
})
return jsonify({"query": query, "results": articles})
Step 3: Add AI-Generated Answers
Go beyond listing articles. Generate a direct answer from the relevant content:
import anthropic
ai_client = anthropic.Anthropic()
@app.route("/api/ask", methods=["POST"])
def ask():
query = request.json["query"]
query_embedding = model.encode(query).tolist()
results = collection.query(query_embeddings=[query_embedding], n_results=3)
context = "\n\n".join(results["documents"][0])
sources = [results["metadatas"][0][i] for i in range(len(results["ids"][0]))]
response = ai_client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=300,
system="Answer the question using ONLY the provided context. If the context does not contain the answer, say so. Cite which article the answer comes from.",
messages=[{
"role": "user",
"content": f"Context:\n{context}\n\nQuestion: {query}"
}]
)
return jsonify({
"answer": response.content[0].text,
"sources": [{"title": s["title"], "url": s["url"]} for s in sources]
})
Step 4: Track Search Analytics
import sqlite3
def log_search(query, results_count, clicked_article=None):
conn = sqlite3.connect("kb_analytics.db")
conn.execute("""
INSERT INTO search_log (query, results_count, clicked_article, searched_at)
VALUES (?, ?, ?, datetime('now'))
""", (query, results_count, clicked_article))
conn.commit()
def get_failed_searches(days=7):
"""Find queries with zero results or no clicks."""
conn = sqlite3.connect("kb_analytics.db")
return conn.execute("""
SELECT query, COUNT(*) as frequency
FROM search_log
WHERE (results_count = 0 OR clicked_article IS NULL)
AND searched_at > datetime('now', ?)
GROUP BY query ORDER BY frequency DESC LIMIT 20
""", (f"-{days} days",)).fetchall()
Step 5: Auto-Suggest While Typing
@app.route("/api/suggest", methods=["POST"])
def suggest():
partial_query = request.json["query"]
if len(partial_query) < 3:
return jsonify({"suggestions": []})
embedding = model.encode(partial_query).tolist()
results = collection.query(query_embeddings=[embedding], n_results=3)
suggestions = [results["metadatas"][0][i]["title"] for i in range(len(results["ids"][0]))]
return jsonify({"suggestions": suggestions})
What to Build Next
Use failed search data to identify content gaps. Every week, review the top queries with no good results and write new articles to fill those gaps. The knowledge base improves itself through usage data.
Related Reading
- The Centralized Brain Concept - knowledge bases as the business intelligence hub
- AI in Customer Service - self-service in the full support automation picture
- How Systems Compound Over Time - knowledge bases that get better with every search
Want this system built for your business?
Get a free assessment. We will map every system your business needs and show you the ROI.
Get Your Free Assessment