How to Create an Automated FAQ System from Support Tickets
Generate FAQ content automatically from common support ticket themes.
Jay Banlasan
The AI Systems Guy
When you automate faq generation from support tickets, your knowledge base writes itself from real customer questions. I build these because most FAQ pages are written by people guessing what customers ask. Your ticket data already contains the actual questions. The system clusters tickets by theme, generates clean Q&A pairs, and flags new topics weekly.
Your FAQ stays current without anyone maintaining it manually.
What You Need Before Starting
- A ticket database with resolved tickets and agent responses
- Python 3.8+ with sentence-transformers and the Anthropic SDK
- A knowledge base to publish FAQ entries to
- At least 100 resolved tickets for meaningful clustering
Step 1: Extract Question-Answer Pairs from Tickets
import anthropic
import json
client = anthropic.Anthropic()
def extract_qa_pair(ticket):
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=200,
messages=[{
"role": "user",
"content": f"""Extract a clean FAQ entry from this support ticket.
Rewrite the question as a customer would ask it.
Rewrite the answer as a clear, helpful response.
Customer message: {ticket['body']}
Agent response: {ticket['resolution']}
Respond with JSON: {{"question": "...", "answer": "...", "category": "..."}}"""
}]
)
return json.loads(response.content[0].text)
Step 2: Cluster Similar Questions
from sentence_transformers import SentenceTransformer
from sklearn.cluster import AgglomerativeClustering
import numpy as np
model = SentenceTransformer("all-MiniLM-L6-v2")
def cluster_questions(qa_pairs, threshold=0.3):
questions = [qa["question"] for qa in qa_pairs]
embeddings = model.encode(questions)
clustering = AgglomerativeClustering(
n_clusters=None,
distance_threshold=threshold,
metric="cosine",
linkage="average"
)
labels = clustering.fit_predict(embeddings)
clusters = {}
for qa, label in zip(qa_pairs, labels):
if label not in clusters:
clusters[label] = []
clusters[label].append(qa)
return clusters
Step 3: Generate Canonical FAQ Entries
For each cluster, pick the best question and merge answers:
def generate_canonical_faq(cluster_qas):
questions = [qa["question"] for qa in cluster_qas]
answers = [qa["answer"] for qa in cluster_qas]
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=300,
messages=[{
"role": "user",
"content": f"""These are variations of the same support question. Create ONE canonical FAQ entry.
Questions asked:
{chr(10).join(questions[:5])}
Answers given:
{chr(10).join(answers[:5])}
Write the clearest version of the question and the most complete answer.
Respond with JSON: {{"question": "...", "answer": "...", "frequency": {len(cluster_qas)}}}"""
}]
)
return json.loads(response.content[0].text)
Step 4: Run the Pipeline Weekly
def generate_weekly_faq_update():
tickets = get_resolved_tickets(days=7)
qa_pairs = []
for ticket in tickets:
try:
qa = extract_qa_pair(ticket)
qa_pairs.append(qa)
except Exception:
continue
if not qa_pairs:
return {"status": "no_new_tickets"}
clusters = cluster_questions(qa_pairs)
new_faqs = []
for label, qas in clusters.items():
if len(qas) >= 3:
faq = generate_canonical_faq(qas)
faq["cluster_size"] = len(qas)
new_faqs.append(faq)
new_faqs.sort(key=lambda f: f["frequency"], reverse=True)
return new_faqs
Step 5: Check for Duplicates Before Publishing
def is_duplicate(new_faq, existing_faqs, threshold=0.85):
new_embedding = model.encode(new_faq["question"])
for existing in existing_faqs:
existing_embedding = model.encode(existing["question"])
similarity = np.dot(new_embedding, existing_embedding) / (
np.linalg.norm(new_embedding) * np.linalg.norm(existing_embedding)
)
if similarity > threshold:
return True, existing
return False, None
def publish_new_faqs(new_faqs):
existing = get_existing_faqs()
published = []
for faq in new_faqs:
is_dup, match = is_duplicate(faq, existing)
if is_dup:
update_faq_answer(match["id"], faq["answer"])
else:
publish_faq(faq)
published.append(faq)
return published
What to Build Next
Add popularity tracking. Monitor which FAQ entries get the most views and which still result in ticket creation. If customers read an FAQ and then submit a ticket anyway, the answer needs rewriting.
Related Reading
- The Centralized Brain Concept - FAQ as a living intelligence layer
- The Feedback Loop That Powers Everything - tickets feeding back into self-service content
- AI for Content Creation at Scale - automated content generation from data
Want this system built for your business?
Get a free assessment. We will map every system your business needs and show you the ROI.
Get Your Free Assessment