Systems Library / AI Model Setup / How to Build Few-Shot Prompts for Consistent Output

AI Model Setup advanced

How to Build Few-Shot Prompts for Consistent Output

Use example-based prompting to get reliable, formatted AI responses every time.

Jay Banlasan

The AI Systems Guy

Few-shot prompt engineering for consistent output is the fastest way to get a model to produce exactly the format and style you want without fine-tuning. Instead of describing what you want in abstract rules, you show the model three to five examples of the exact input-output pairs you expect. The model pattern-matches and produces outputs that follow your demonstrated format. I use this on every project where output format matters and where the cost of fine-tuning is not justified.

The technique works because large language models are trained to continue patterns. When you give them explicit examples, you are not asking them to follow instructions abstractly, you are showing them the pattern to continue.

What You Need Before Starting

An API key for Claude or GPT-4
3-10 high-quality input-output examples that represent your desired output format
A clear task where format consistency is the problem you are solving

Step 1: Understand When Few-Shot Beats Zero-Shot

Zero-shot prompting: "Classify this support ticket as billing, technical, or account." Few-shot prompting: Same instruction plus three examples showing the exact classification format.

Use few-shot when:

Zero-shot produces correct content but inconsistent format
Your output categories have overlap the model keeps confusing
You need a specific writing style that is hard to describe in rules
The task involves judgment calls you want the model to replicate

Zero-shot is fine when:

The task is straightforward (summarize, translate)
Format is simple (yes/no, single number)
You are already getting consistent outputs

Step 2: Build Your Example Set

Examples are the core of few-shot prompting. Quality beats quantity.

# Good few-shot examples for a ticket classifier
TICKET_CLASSIFICATION_EXAMPLES = [
    {
        "input": "My card was charged twice for the same subscription.",
        "output": '{"category": "billing", "priority": "high", "action": "refund_check"}'
    },
    {
        "input": "The API is returning 500 errors when I try to fetch user data.",
        "output": '{"category": "technical", "priority": "urgent", "action": "engineering_escalate"}'
    },
    {
        "input": "I forgot my password and can't get the reset email.",
        "output": '{"category": "account", "priority": "normal", "action": "manual_reset"}'
    },
    {
        "input": "Would love to see dark mode added to the dashboard.",
        "output": '{"category": "feature_request", "priority": "low", "action": "log_feedback"}'
    },
    {
        "input": "I've been a customer for 3 years and this is unacceptable service.",
        "output": '{"category": "account", "priority": "high", "action": "csm_callback"}'
    }
]

Rules for good examples:

Cover the range of inputs you expect in production
Include edge cases that are genuinely ambiguous
Every example output must be exactly the format you want
No mediocre examples - each one teaches a pattern

Step 3: Build the Few-Shot Prompt Function

import anthropic
import json

client = anthropic.Anthropic()

def build_few_shot_prompt(
    task_instruction: str,
    examples: list,
    new_input: str,
    input_label: str = "Input",
    output_label: str = "Output"
) -> str:
    prompt_parts = [task_instruction, ""]

    for example in examples:
        prompt_parts.append(f"{input_label}: {example['input']}")
        prompt_parts.append(f"{output_label}: {example['output']}")
        prompt_parts.append("")

    prompt_parts.append(f"{input_label}: {new_input}")
    prompt_parts.append(f"{output_label}:")

    return "\n".join(prompt_parts)

def classify_ticket_few_shot(ticket_text: str) -> dict:
    instruction = """Classify support tickets. Return JSON only.
Schema: {"category": "billing|technical|account|feature_request|other", "priority": "urgent|high|normal|low", "action": string}"""

    prompt = build_few_shot_prompt(
        instruction,
        TICKET_CLASSIFICATION_EXAMPLES,
        ticket_text
    )

    response = client.messages.create(
        model="claude-haiku-4-5",
        max_tokens=150,
        messages=[{"role": "user", "content": prompt}]
    )

    raw_output = response.content[0].text.strip()

    try:
        return json.loads(raw_output)
    except json.JSONDecodeError:
        # Try to extract JSON if there is extra text
        import re
        match = re.search(r'\{.*?\}', raw_output, re.DOTALL)
        if match:
            return json.loads(match.group())
        return {"error": "Parse failed", "raw": raw_output}

# Test
result = classify_ticket_few_shot("I need to update the billing email on my account.")
print(result)

Step 4: Few-Shot for Style and Voice

Few-shot is especially powerful for style replication. Give it samples of writing and it mimics the voice.

STYLE_EXAMPLES = [
    {
        "input": "Write a follow-up for someone who attended our webinar on AI automation.",
        "output": "Hey [Name] - good to have you on the AI automation call. The workflow piece you asked about during Q&A - I can show you exactly how we built that. Worth a 20-minute call? I have Thursday afternoon free."
    },
    {
        "input": "Write a follow-up for someone who downloaded our pricing guide 3 days ago.",
        "output": "[Name] - you grabbed the pricing guide a few days back. Usually people have questions about the setup timeline after reading it. Anything I can clear up, or does the scope feel about right for what you're working on?"
    },
    {
        "input": "Write a follow-up for someone who went cold after two previous emails.",
        "output": "[Name], I'll keep this short. Still working on the [problem area]? If the timing's off, totally fine - just let me know and I'll check back in Q3. If you're ready to move, I can get you a live demo this week."
    }
]

def generate_follow_up_few_shot(context: str) -> str:
    instruction = "Write a short B2B follow-up email. Under 80 words. Conversational. End with one question. Use [Name] as placeholder."

    prompt = build_few_shot_prompt(instruction, STYLE_EXAMPLES, context)

    response = client.messages.create(
        model="claude-haiku-4-5",
        max_tokens=200,
        messages=[{"role": "user", "content": prompt}]
    )

    return response.content[0].text.strip()

Step 5: Dynamic Few-Shot Selection

For tasks with many categories, selecting the most relevant examples for each input improves accuracy more than using the same fixed set every time.

def select_relevant_examples(
    input_text: str,
    example_pool: list,
    n: int = 3
) -> list:
    """Select examples most similar to the input using basic keyword matching.
    For production, use embedding similarity instead."""

    scored = []
    input_words = set(input_text.lower().split())

    for example in example_pool:
        example_words = set(example["input"].lower().split())
        overlap = len(input_words & example_words)
        scored.append((overlap, example))

    # Sort by overlap score descending
    scored.sort(key=lambda x: x[0], reverse=True)

    # Return top n, ensuring variety in categories if possible
    selected = [ex for _, ex in scored[:n]]
    return selected

def classify_with_dynamic_examples(ticket_text: str) -> dict:
    relevant_examples = select_relevant_examples(
        ticket_text,
        TICKET_CLASSIFICATION_EXAMPLES,
        n=3
    )

    instruction = """Classify support tickets. Return JSON only.
Schema: {"category": "billing|technical|account|feature_request|other", "priority": "urgent|high|normal|low", "action": string}"""

    prompt = build_few_shot_prompt(instruction, relevant_examples, ticket_text)

    response = client.messages.create(
        model="claude-haiku-4-5",
        max_tokens=150,
        messages=[{"role": "user", "content": prompt}]
    )

    try:
        return json.loads(response.content[0].text.strip())
    except json.JSONDecodeError:
        return {"error": "parse_failed", "raw": response.content[0].text}

For production, replace keyword matching with embedding cosine similarity using a model like text-embedding-3-small.

Step 6: Measure Consistency Across Runs

Few-shot prompts are more consistent than zero-shot but still vary. Measure it.

def measure_consistency(
    prompt_fn,
    test_input: str,
    n_runs: int = 10
) -> dict:
    outputs = [prompt_fn(test_input) for _ in range(n_runs)]

    # For classification tasks, measure how often the same category appears
    if outputs and isinstance(outputs[0], dict):
        categories = [o.get("category", "error") for o in outputs]
        from collections import Counter
        category_counts = Counter(categories)
        most_common = category_counts.most_common(1)[0]
        consistency_rate = most_common[1] / n_runs

        return {
            "consistency_rate": consistency_rate,
            "dominant_output": most_common[0],
            "distribution": dict(category_counts),
            "n_runs": n_runs
        }

    return {"outputs": outputs}

# Measure classification consistency
result = measure_consistency(
    classify_ticket_few_shot,
    "I need to cancel my subscription and get a refund for this month.",
    n_runs=5
)
print(f"Consistency: {result['consistency_rate']:.0%} agreed on '{result['dominant_output']}'")

Anything below 80% consistency on a clear-cut input means your examples need refinement. Add a counterexample for the case causing disagreement.

What to Build Next

Build an example library organized by task type that you reuse across projects
Add embedding-based example selection so prompts automatically pick the most relevant examples at runtime
Create a consistency dashboard that tracks output stability across model versions

How to Build Few-Shot Prompts for Consistent Output

What You Need Before Starting

Step 1: Understand When Few-Shot Beats Zero-Shot

Step 2: Build Your Example Set

Step 3: Build the Few-Shot Prompt Function

Step 4: Few-Shot for Style and Voice

Step 5: Dynamic Few-Shot Selection

Step 6: Measure Consistency Across Runs

What to Build Next

Related Reading

Related Systems

How to Write System Prompts That Control AI Behavior

How to Optimize Token Usage to Cut AI Costs

How to Build Persona-Based AI Assistants