How to Implement Chain-of-Thought Reasoning
Force AI models to show their work for more accurate complex reasoning.
Jay Banlasan
The AI Systems Guy
Chain of thought prompting for AI reasoning is how you get accurate answers to multi-step problems that models routinely fail when asked to answer directly. The technique forces the model to reason through each step before giving an answer. On complex reasoning, math, and analysis tasks, accuracy jumps 20-40% compared to asking for the answer directly. I use this on any task where the answer requires more than two logical steps: deal analysis, troubleshooting trees, financial calculations, and anything where a wrong intermediate assumption would corrupt the final answer.
The intuition is straightforward: when you force the model to write out its reasoning, it catches its own errors during the generation process. Asking for an answer directly skips that self-correction.
What You Need Before Starting
- An Anthropic or OpenAI API key
- A task where accuracy on multi-step problems is the bottleneck
- Python 3.10+ with
anthropic(pip install anthropic)
Step 1: The Simplest CoT Pattern
The most basic chain-of-thought pattern adds one sentence: "Think step by step before answering."
import anthropic
client = anthropic.Anthropic()
def answer_direct(question: str) -> str:
response = client.messages.create(
model="claude-haiku-4-5",
max_tokens=200,
messages=[{"role": "user", "content": question}]
)
return response.content[0].text
def answer_with_cot(question: str) -> str:
cot_question = f"{question}\n\nThink through this step by step before giving your final answer."
response = client.messages.create(
model="claude-haiku-4-5",
max_tokens=600,
messages=[{"role": "user", "content": cot_question}]
)
return response.content[0].text
# Compare
question = "A SaaS company has $50k MRR growing at 8% month-over-month. What will their ARR be in 6 months?"
print("Direct answer:")
print(answer_direct(question))
print("\nChain-of-thought answer:")
print(answer_with_cot(question))
The direct answer often gives you a number without the intermediate calculations visible. The CoT answer shows the compounding formula applied correctly, and you can verify each step.
Step 2: Structured CoT With Explicit Steps
For production systems, structured CoT is more reliable than open-ended "think step by step." Define the exact reasoning structure.
def structured_cot(
problem: str,
reasoning_steps: list,
model: str = "claude-sonnet-4-5"
) -> dict:
steps_formatted = "\n".join([f"Step {i+1}: {step}" for i, step in enumerate(reasoning_steps)])
prompt = f"""Problem: {problem}
Reason through this problem using exactly these steps:
{steps_formatted}
Step {len(reasoning_steps)+1}: State your final answer clearly.
Work through each step in order. Show your work at each step."""
response = client.messages.create(
model=model,
max_tokens=1000,
messages=[{"role": "user", "content": prompt}]
)
return {
"reasoning": response.content[0].text,
"model": model
}
# Business analysis example
result = structured_cot(
problem="Should we pause our highest-spending ad set that has $800 spend and 0 conversions at day 4?",
reasoning_steps=[
"State the current CPA target and what 0 conversions at $800 means relative to it",
"Check if 4 days and $800 gives enough statistical signal for a decision",
"List what other data points we'd want before making the call",
"State the recommendation with confidence level"
]
)
print(result["reasoning"])
Step 3: CoT With Answer Extraction
When you need to use the answer programmatically, structure the prompt to separate reasoning from the final answer.
import re
import json
def cot_with_structured_answer(
problem: str,
answer_schema: dict,
system_prompt: str = ""
) -> dict:
schema_str = json.dumps(answer_schema, indent=2)
prompt = f"""Problem: {problem}
Work through this step by step, showing your reasoning.
After your reasoning, output your final answer in this exact JSON format:
<answer>
{schema_str}
</answer>"""
response = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1000,
system=system_prompt if system_prompt else "You are a careful analyst who shows your reasoning.",
messages=[{"role": "user", "content": prompt}]
)
full_text = response.content[0].text
# Extract reasoning and answer separately
answer_match = re.search(r'<answer>(.*?)</answer>', full_text, re.DOTALL)
result = {
"reasoning": full_text,
"answer": None,
"parse_error": None
}
if answer_match:
try:
result["answer"] = json.loads(answer_match.group(1).strip())
result["reasoning"] = full_text[:full_text.find("<answer>")].strip()
except json.JSONDecodeError as e:
result["parse_error"] = str(e)
return result
# Example: lead scoring
result = cot_with_structured_answer(
problem="Score this lead: B2B software company, 50 employees, director-level contact, requested pricing, budget of $2k/month, decision in 30 days.",
answer_schema={
"score": 0,
"tier": "hot|warm|cold",
"top_signal": "string",
"recommended_action": "string"
}
)
print("Reasoning:")
print(result["reasoning"])
print("\nStructured Answer:")
print(json.dumps(result["answer"], indent=2))
Step 4: Self-Consistency CoT
Run the same problem multiple times and take the majority answer. This reduces variance on problems where CoT sometimes makes different valid-sounding but wrong choices.
from collections import Counter
def self_consistent_cot(
problem: str,
n_samples: int = 5,
extract_answer_fn = None
) -> dict:
answers = []
reasonings = []
for i in range(n_samples):
response = client.messages.create(
model="claude-haiku-4-5",
max_tokens=400,
messages=[{
"role": "user",
"content": f"{problem}\n\nThink step by step, then give your final answer on the last line starting with 'Answer:'"
}],
temperature=0.7 # Some variation between samples
)
text = response.content[0].text
reasonings.append(text)
# Extract the answer from "Answer: X"
answer_match = re.search(r'Answer:\s*(.+?)(?:\n|$)', text, re.IGNORECASE)
if answer_match:
answers.append(answer_match.group(1).strip())
if not answers:
return {"error": "Could not extract answers", "reasonings": reasonings}
# Count votes
answer_counts = Counter(answers)
majority_answer, majority_count = answer_counts.most_common(1)[0]
confidence = majority_count / n_samples
return {
"answer": majority_answer,
"confidence": confidence,
"vote_distribution": dict(answer_counts),
"all_reasonings": reasonings,
"n_samples": n_samples
}
# Example: classification that sometimes goes wrong
result = self_consistent_cot(
"A user emails: 'I need to switch from monthly to annual billing but my payment method expired.' Which team should handle this: billing or account management?",
n_samples=5
)
print(f"Answer: {result['answer']}")
print(f"Confidence: {result['confidence']:.0%}")
print(f"Votes: {result['vote_distribution']}")
Step 5: Tree-of-Thought for Complex Decisions
For decisions with multiple valid paths, explore several reasoning branches and evaluate them.
def tree_of_thought(
problem: str,
n_branches: int = 3
) -> dict:
# Step 1: Generate multiple reasoning approaches
approaches_prompt = f"""Problem: {problem}
Generate {n_branches} different reasoning approaches for solving this problem.
Each approach should start from a different angle or assumption.
Number each approach clearly."""
approaches_response = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=600,
messages=[{"role": "user", "content": approaches_prompt}]
)
approaches_text = approaches_response.content[0].text
# Step 2: Evaluate each approach and pick the best
evaluate_prompt = f"""Problem: {problem}
Here are {n_branches} reasoning approaches:
{approaches_text}
For each approach:
1. Follow it to its conclusion
2. Rate its soundness 1-10
3. State one weakness
Then pick the best approach and give the final answer."""
final_response = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1000,
messages=[{"role": "user", "content": evaluate_prompt}]
)
return {
"approaches": approaches_text,
"evaluation_and_answer": final_response.content[0].text
}
Step 6: When NOT to Use CoT
CoT adds tokens, which adds cost and latency. Do not use it for:
# These do NOT need CoT:
SIMPLE_TASKS = [
"Classify as positive/negative/neutral",
"Extract the company name from this email",
"Translate this sentence to Spanish",
"Is this email a reply or a new thread?",
]
# These DO benefit from CoT:
COMPLEX_TASKS = [
"Should I scale or pause this ad campaign based on these metrics?",
"Which pricing tier does this customer's usage suggest they need?",
"What is the root cause of this error sequence?",
"Does this contract clause create any risk for clause 4.2?",
]
def smart_cot_router(task: str, is_complex: bool) -> str:
if is_complex:
return answer_with_cot(task)
else:
return answer_direct(task)
A practical rule: if the answer requires combining more than two pieces of information where each affects the next, use CoT.
What to Build Next
- Build a CoT evaluation system that scores reasoning quality automatically using a judge model
- Create task-specific CoT templates for your most common analysis tasks
- Implement progressive CoT where simple answers are direct and confidence below a threshold triggers CoT
Related Reading
- Chain of Thought Prompting for Business Decisions - chain of thought prompting business
- Multi-Step Reasoning for Complex Problems - multi step reasoning ai business
- The OpenAI o1 Reasoning Model for Business Decisions - openai o1 reasoning business decisions
Want this system built for your business?
Get a free assessment. We will map every system your business needs and show you the ROI.
Get Your Free Assessment