How to Implement Chain-of-Thought Reasoning

Force AI models to show their work for more accurate complex reasoning.

Jay Banlasan

The AI Systems Guy

Chain of thought prompting for AI reasoning is how you get accurate answers to multi-step problems that models routinely fail when asked to answer directly. The technique forces the model to reason through each step before giving an answer. On complex reasoning, math, and analysis tasks, accuracy jumps 20-40% compared to asking for the answer directly. I use this on any task where the answer requires more than two logical steps: deal analysis, troubleshooting trees, financial calculations, and anything where a wrong intermediate assumption would corrupt the final answer.

The intuition is straightforward: when you force the model to write out its reasoning, it catches its own errors during the generation process. Asking for an answer directly skips that self-correction.

What You Need Before Starting

An Anthropic or OpenAI API key
A task where accuracy on multi-step problems is the bottleneck
Python 3.10+ with anthropic (pip install anthropic)

Step 1: The Simplest CoT Pattern

The most basic chain-of-thought pattern adds one sentence: "Think step by step before answering."

import anthropic

client = anthropic.Anthropic()

def answer_direct(question: str) -> str:
    response = client.messages.create(
        model="claude-haiku-4-5",
        max_tokens=200,
        messages=[{"role": "user", "content": question}]
    )
    return response.content[0].text

def answer_with_cot(question: str) -> str:
    cot_question = f"{question}\n\nThink through this step by step before giving your final answer."

    response = client.messages.create(
        model="claude-haiku-4-5",
        max_tokens=600,
        messages=[{"role": "user", "content": cot_question}]
    )
    return response.content[0].text

# Compare
question = "A SaaS company has $50k MRR growing at 8% month-over-month. What will their ARR be in 6 months?"

print("Direct answer:")
print(answer_direct(question))
print("\nChain-of-thought answer:")
print(answer_with_cot(question))

The direct answer often gives you a number without the intermediate calculations visible. The CoT answer shows the compounding formula applied correctly, and you can verify each step.

Step 2: Structured CoT With Explicit Steps

For production systems, structured CoT is more reliable than open-ended "think step by step." Define the exact reasoning structure.

def structured_cot(
    problem: str,
    reasoning_steps: list,
    model: str = "claude-sonnet-4-5"
) -> dict:
    steps_formatted = "\n".join([f"Step {i+1}: {step}" for i, step in enumerate(reasoning_steps)])

    prompt = f"""Problem: {problem}

Reason through this problem using exactly these steps:
{steps_formatted}
Step {len(reasoning_steps)+1}: State your final answer clearly.

Work through each step in order. Show your work at each step."""

    response = client.messages.create(
        model=model,
        max_tokens=1000,
        messages=[{"role": "user", "content": prompt}]
    )

    return {
        "reasoning": response.content[0].text,
        "model": model
    }

# Business analysis example
result = structured_cot(
    problem="Should we pause our highest-spending ad set that has $800 spend and 0 conversions at day 4?",
    reasoning_steps=[
        "State the current CPA target and what 0 conversions at $800 means relative to it",
        "Check if 4 days and $800 gives enough statistical signal for a decision",
        "List what other data points we'd want before making the call",
        "State the recommendation with confidence level"
    ]
)
print(result["reasoning"])

Step 3: CoT With Answer Extraction

When you need to use the answer programmatically, structure the prompt to separate reasoning from the final answer.

import re
import json

def cot_with_structured_answer(
    problem: str,
    answer_schema: dict,
    system_prompt: str = ""
) -> dict:
    schema_str = json.dumps(answer_schema, indent=2)

    prompt = f"""Problem: {problem}

Work through this step by step, showing your reasoning.

After your reasoning, output your final answer in this exact JSON format:
<answer>
{schema_str}
</answer>"""

    response = client.messages.create(
        model="claude-sonnet-4-5",
        max_tokens=1000,
        system=system_prompt if system_prompt else "You are a careful analyst who shows your reasoning.",
        messages=[{"role": "user", "content": prompt}]
    )

    full_text = response.content[0].text

    # Extract reasoning and answer separately
    answer_match = re.search(r'<answer>(.*?)</answer>', full_text, re.DOTALL)

    result = {
        "reasoning": full_text,
        "answer": None,
        "parse_error": None
    }

    if answer_match:
        try:
            result["answer"] = json.loads(answer_match.group(1).strip())
            result["reasoning"] = full_text[:full_text.find("<answer>")].strip()
        except json.JSONDecodeError as e:
            result["parse_error"] = str(e)

    return result

# Example: lead scoring
result = cot_with_structured_answer(
    problem="Score this lead: B2B software company, 50 employees, director-level contact, requested pricing, budget of $2k/month, decision in 30 days.",
    answer_schema={
        "score": 0,
        "tier": "hot|warm|cold",
        "top_signal": "string",
        "recommended_action": "string"
    }
)

print("Reasoning:")
print(result["reasoning"])
print("\nStructured Answer:")
print(json.dumps(result["answer"], indent=2))

Step 4: Self-Consistency CoT

Run the same problem multiple times and take the majority answer. This reduces variance on problems where CoT sometimes makes different valid-sounding but wrong choices.

from collections import Counter

def self_consistent_cot(
    problem: str,
    n_samples: int = 5,
    extract_answer_fn = None
) -> dict:
    answers = []
    reasonings = []

    for i in range(n_samples):
        response = client.messages.create(
            model="claude-haiku-4-5",
            max_tokens=400,
            messages=[{
                "role": "user",
                "content": f"{problem}\n\nThink step by step, then give your final answer on the last line starting with 'Answer:'"
            }],
            temperature=0.7  # Some variation between samples
        )

        text = response.content[0].text
        reasonings.append(text)

        # Extract the answer from "Answer: X"
        answer_match = re.search(r'Answer:\s*(.+?)(?:\n|$)', text, re.IGNORECASE)
        if answer_match:
            answers.append(answer_match.group(1).strip())

    if not answers:
        return {"error": "Could not extract answers", "reasonings": reasonings}

    # Count votes
    answer_counts = Counter(answers)
    majority_answer, majority_count = answer_counts.most_common(1)[0]
    confidence = majority_count / n_samples

    return {
        "answer": majority_answer,
        "confidence": confidence,
        "vote_distribution": dict(answer_counts),
        "all_reasonings": reasonings,
        "n_samples": n_samples
    }

# Example: classification that sometimes goes wrong
result = self_consistent_cot(
    "A user emails: 'I need to switch from monthly to annual billing but my payment method expired.' Which team should handle this: billing or account management?",
    n_samples=5
)

print(f"Answer: {result['answer']}")
print(f"Confidence: {result['confidence']:.0%}")
print(f"Votes: {result['vote_distribution']}")

Step 5: Tree-of-Thought for Complex Decisions

For decisions with multiple valid paths, explore several reasoning branches and evaluate them.

def tree_of_thought(
    problem: str,
    n_branches: int = 3
) -> dict:
    # Step 1: Generate multiple reasoning approaches
    approaches_prompt = f"""Problem: {problem}

Generate {n_branches} different reasoning approaches for solving this problem.
Each approach should start from a different angle or assumption.
Number each approach clearly."""

    approaches_response = client.messages.create(
        model="claude-sonnet-4-5",
        max_tokens=600,
        messages=[{"role": "user", "content": approaches_prompt}]
    )
    approaches_text = approaches_response.content[0].text

    # Step 2: Evaluate each approach and pick the best
    evaluate_prompt = f"""Problem: {problem}

Here are {n_branches} reasoning approaches:
{approaches_text}

For each approach:
1. Follow it to its conclusion
2. Rate its soundness 1-10
3. State one weakness

Then pick the best approach and give the final answer."""

    final_response = client.messages.create(
        model="claude-sonnet-4-5",
        max_tokens=1000,
        messages=[{"role": "user", "content": evaluate_prompt}]
    )

    return {
        "approaches": approaches_text,
        "evaluation_and_answer": final_response.content[0].text
    }

Step 6: When NOT to Use CoT

CoT adds tokens, which adds cost and latency. Do not use it for:

# These do NOT need CoT:
SIMPLE_TASKS = [
    "Classify as positive/negative/neutral",
    "Extract the company name from this email",
    "Translate this sentence to Spanish",
    "Is this email a reply or a new thread?",
]

# These DO benefit from CoT:
COMPLEX_TASKS = [
    "Should I scale or pause this ad campaign based on these metrics?",
    "Which pricing tier does this customer's usage suggest they need?",
    "What is the root cause of this error sequence?",
    "Does this contract clause create any risk for clause 4.2?",
]

def smart_cot_router(task: str, is_complex: bool) -> str:
    if is_complex:
        return answer_with_cot(task)
    else:
        return answer_direct(task)

A practical rule: if the answer requires combining more than two pieces of information where each affects the next, use CoT.

What to Build Next

Build a CoT evaluation system that scores reasoning quality automatically using a judge model
Create task-specific CoT templates for your most common analysis tasks
Implement progressive CoT where simple answers are direct and confidence below a threshold triggers CoT

How to Implement Chain-of-Thought Reasoning

What You Need Before Starting

Step 1: The Simplest CoT Pattern

Step 2: Structured CoT With Explicit Steps

Step 3: CoT With Answer Extraction

Step 4: Self-Consistency CoT

Step 5: Tree-of-Thought for Complex Decisions

Step 6: When NOT to Use CoT

What to Build Next

Related Reading

Related Systems

How to Write System Prompts That Control AI Behavior

How to Optimize Token Usage to Cut AI Costs

How to Build Persona-Based AI Assistants