Systems Library / Customer Service / How to Create an Automated FAQ System from Support Tickets

Customer Service self service

How to Create an Automated FAQ System from Support Tickets

Generate FAQ content automatically from common support ticket themes.

Jay Banlasan

The AI Systems Guy

When you automate faq generation from support tickets, your knowledge base writes itself from real customer questions. I build these because most FAQ pages are written by people guessing what customers ask. Your ticket data already contains the actual questions. The system clusters tickets by theme, generates clean Q&A pairs, and flags new topics weekly.

Your FAQ stays current without anyone maintaining it manually.

What You Need Before Starting

A ticket database with resolved tickets and agent responses
Python 3.8+ with sentence-transformers and the Anthropic SDK
A knowledge base to publish FAQ entries to
At least 100 resolved tickets for meaningful clustering

Step 1: Extract Question-Answer Pairs from Tickets

import anthropic
import json

client = anthropic.Anthropic()

def extract_qa_pair(ticket):
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=200,
        messages=[{
            "role": "user",
            "content": f"""Extract a clean FAQ entry from this support ticket.
Rewrite the question as a customer would ask it.
Rewrite the answer as a clear, helpful response.

Customer message: {ticket['body']}
Agent response: {ticket['resolution']}

Respond with JSON: {{"question": "...", "answer": "...", "category": "..."}}"""
        }]
    )
    return json.loads(response.content[0].text)

Step 2: Cluster Similar Questions

from sentence_transformers import SentenceTransformer
from sklearn.cluster import AgglomerativeClustering
import numpy as np

model = SentenceTransformer("all-MiniLM-L6-v2")

def cluster_questions(qa_pairs, threshold=0.3):
    questions = [qa["question"] for qa in qa_pairs]
    embeddings = model.encode(questions)

    clustering = AgglomerativeClustering(
        n_clusters=None,
        distance_threshold=threshold,
        metric="cosine",
        linkage="average"
    )
    labels = clustering.fit_predict(embeddings)

    clusters = {}
    for qa, label in zip(qa_pairs, labels):
        if label not in clusters:
            clusters[label] = []
        clusters[label].append(qa)

    return clusters

Step 3: Generate Canonical FAQ Entries

For each cluster, pick the best question and merge answers:

def generate_canonical_faq(cluster_qas):
    questions = [qa["question"] for qa in cluster_qas]
    answers = [qa["answer"] for qa in cluster_qas]

    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=300,
        messages=[{
            "role": "user",
            "content": f"""These are variations of the same support question. Create ONE canonical FAQ entry.

Questions asked:
{chr(10).join(questions[:5])}

Answers given:
{chr(10).join(answers[:5])}

Write the clearest version of the question and the most complete answer.
Respond with JSON: {{"question": "...", "answer": "...", "frequency": {len(cluster_qas)}}}"""
        }]
    )
    return json.loads(response.content[0].text)

Step 4: Run the Pipeline Weekly

def generate_weekly_faq_update():
    tickets = get_resolved_tickets(days=7)

    qa_pairs = []
    for ticket in tickets:
        try:
            qa = extract_qa_pair(ticket)
            qa_pairs.append(qa)
        except Exception:
            continue

    if not qa_pairs:
        return {"status": "no_new_tickets"}

    clusters = cluster_questions(qa_pairs)

    new_faqs = []
    for label, qas in clusters.items():
        if len(qas) >= 3:
            faq = generate_canonical_faq(qas)
            faq["cluster_size"] = len(qas)
            new_faqs.append(faq)

    new_faqs.sort(key=lambda f: f["frequency"], reverse=True)
    return new_faqs

Step 5: Check for Duplicates Before Publishing

def is_duplicate(new_faq, existing_faqs, threshold=0.85):
    new_embedding = model.encode(new_faq["question"])

    for existing in existing_faqs:
        existing_embedding = model.encode(existing["question"])
        similarity = np.dot(new_embedding, existing_embedding) / (
            np.linalg.norm(new_embedding) * np.linalg.norm(existing_embedding)
        )
        if similarity > threshold:
            return True, existing
    return False, None

def publish_new_faqs(new_faqs):
    existing = get_existing_faqs()
    published = []

    for faq in new_faqs:
        is_dup, match = is_duplicate(faq, existing)
        if is_dup:
            update_faq_answer(match["id"], faq["answer"])
        else:
            publish_faq(faq)
            published.append(faq)

    return published

What to Build Next

Add popularity tracking. Monitor which FAQ entries get the most views and which still result in ticket creation. If customers read an FAQ and then submit a ticket anyway, the answer needs rewriting.

How to Create an Automated FAQ System from Support Tickets

What You Need Before Starting

Step 1: Extract Question-Answer Pairs from Tickets

Step 2: Cluster Similar Questions

Step 3: Generate Canonical FAQ Entries

Step 4: Run the Pipeline Weekly

Step 5: Check for Duplicates Before Publishing

What to Build Next

Related Reading

Related Systems

How to Build an AI-Powered Knowledge Base

How to Automate Help Documentation Updates

How to Create an Automated Video Tutorial Library