How to Configure Claude for JSON Output Mode
Force Claude to return structured JSON for automated data processing pipelines.
Jay Banlasan
The AI Systems Guy
Getting reliable JSON from Claude is what turns AI from a text generator into a data processing engine. The claude json output mode configuration technique I use combines a strong system prompt with a JSON schema in the user message. When you get this right, you can pipe Claude's output directly into your database, spreadsheet, or downstream API without any parsing gymnastics.
The key insight is that Claude does not have a native "JSON mode" toggle the way OpenAI does. You get reliable JSON by being explicit and specific in your prompts. Combine that with Python's json.loads() and a validation layer, and you have a pipeline that is as reliable as any other data transformation step.
What You Need Before Starting
- Anthropic API key configured (see tutorial 001)
anthropicPython SDK installed- Understanding of JSON schema basics
- A specific data structure you want to extract or generate
Step 1: The Basic JSON Extraction Pattern
The core technique: tell Claude exactly what JSON structure to return in the system prompt, then validate the output:
import os
import json
import anthropic
from dotenv import load_dotenv
load_dotenv()
client = anthropic.Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))
def extract_json(text: str, schema_description: str) -> dict:
"""
Extract structured JSON from unstructured text.
Args:
text: The raw text to extract from
schema_description: Description of the JSON structure you want
Returns:
Parsed dictionary
"""
system_prompt = f"""You are a data extraction engine.
Extract information from text and return ONLY valid JSON.
No markdown. No code blocks. No explanations. Raw JSON only.
Required structure:
{schema_description}"""
response = client.messages.create(
model="claude-opus-4-5",
max_tokens=1000,
system=system_prompt,
messages=[
{"role": "user", "content": f"Extract from this text:\n\n{text}"}
]
)
raw = response.content[0].text.strip()
return json.loads(raw)
# Example: Extract contact information
schema = """{
"name": "string",
"company": "string or null",
"email": "string or null",
"phone": "string or null"
}"""
text = "Hi Jay, I'm Marcus Webb from Apex Consulting. You can reach me at [email protected] or call 555-0192."
result = extract_json(text, schema)
print(result)
# Output: {"name": "Marcus Webb", "company": "Apex Consulting", "email": "[email protected]", "phone": "555-0192"}
Step 2: Use Pydantic for Schema Validation
Validate the JSON matches your expected structure using Pydantic:
pip install pydantic
from pydantic import BaseModel, ValidationError
from typing import Optional
class ContactInfo(BaseModel):
name: str
company: Optional[str] = None
email: Optional[str] = None
phone: Optional[str] = None
def extract_contact(text: str) -> ContactInfo:
"""Extract and validate contact information from text."""
schema = """
{
"name": "full name as string",
"company": "company name or null",
"email": "email address or null",
"phone": "phone number as string or null"
}"""
raw = extract_json(text, schema)
try:
return ContactInfo(**raw)
except ValidationError as e:
print(f"Validation failed: {e}")
raise
contact = extract_contact("Hi Jay, I'm Marcus Webb from Apex Consulting. Reach me at [email protected].")
print(contact.model_dump())
print(f"Email: {contact.email}")
Step 3: Extract Arrays of Objects
A common pattern: extract multiple items from a document:
from typing import List
class LineItem(BaseModel):
description: str
quantity: int
unit_price: float
total: float
class Invoice(BaseModel):
vendor: str
invoice_number: str
date: str
line_items: List[LineItem]
subtotal: float
tax: Optional[float] = None
total_due: float
def parse_invoice(invoice_text: str) -> Invoice:
"""Parse an invoice document into structured data."""
system = """You are an invoice parser. Extract all invoice data and return valid JSON only.
No markdown. No code blocks. Raw JSON only.
Required structure:
{
"vendor": "company name",
"invoice_number": "invoice number string",
"date": "date in YYYY-MM-DD format",
"line_items": [
{
"description": "item description",
"quantity": integer,
"unit_price": float,
"total": float
}
],
"subtotal": float,
"tax": float or null,
"total_due": float
}"""
response = client.messages.create(
model="claude-opus-4-5",
max_tokens=2000,
system=system,
messages=[{"role": "user", "content": invoice_text}]
)
raw = json.loads(response.content[0].text.strip())
return Invoice(**raw)
Step 4: Handle JSON Parsing Failures Gracefully
Claude occasionally returns JSON with minor formatting issues. Add a recovery layer:
import re
def safe_json_extract(text: str, schema_description: str, retries: int = 2) -> dict:
"""
Extract JSON with automatic retry on parse failure.
"""
for attempt in range(retries + 1):
try:
system_prompt = f"""Return ONLY valid JSON matching this structure:
{schema_description}
Rules:
- No markdown formatting
- No ```json``` code blocks
- No explanatory text before or after
- Start your response with {{ and end with }}"""
# On retry, be more explicit
if attempt > 0:
system_prompt += "\n\nIMPORTANT: Your last response could not be parsed. Return only raw JSON starting with { and ending with }"
response = client.messages.create(
model="claude-opus-4-5",
max_tokens=1000,
system=system_prompt,
messages=[{"role": "user", "content": text}]
)
raw = response.content[0].text.strip()
# Strip markdown code blocks if present
raw = re.sub(r'^```(?:json)?\s*', '', raw)
raw = re.sub(r'\s*```$', '', raw)
raw = raw.strip()
return json.loads(raw)
except json.JSONDecodeError as e:
if attempt == retries:
raise ValueError(f"Failed to parse JSON after {retries + 1} attempts: {e}\nRaw output: {raw}")
print(f"JSON parse failed on attempt {attempt + 1}, retrying...")
raise ValueError("Should not reach here")
Step 5: Build a Reusable Extraction Pipeline
from typing import Type, TypeVar
from pydantic import BaseModel
T = TypeVar("T", bound=BaseModel)
def extract_structured(
text: str,
model_class: Type[T],
context: str = ""
) -> T:
"""
Generic structured extraction using a Pydantic model as the schema.
Args:
text: Text to extract from
model_class: Pydantic model defining the output schema
context: Optional context about the document type
Returns:
Validated instance of model_class
"""
schema = model_class.model_json_schema()
schema_str = json.dumps(schema, indent=2)
system = f"""You are a data extraction specialist.
{"Context: " + context if context else ""}
Extract information and return ONLY valid JSON matching this schema:
{schema_str}
Rules: Raw JSON only. No markdown. No explanations. Start with {{ end with }}"""
response = client.messages.create(
model="claude-opus-4-5",
max_tokens=2000,
system=system,
messages=[{"role": "user", "content": text}]
)
raw = response.content[0].text.strip()
raw = re.sub(r'^```(?:json)?\s*', '', raw)
raw = re.sub(r'\s*```$', '', raw)
data = json.loads(raw.strip())
return model_class(**data)
# Use it with any Pydantic model
class JobPosting(BaseModel):
title: str
company: str
location: str
salary_min: Optional[float] = None
salary_max: Optional[float] = None
remote: bool
required_skills: List[str]
job_text = "Senior Python Engineer at TechCorp in Austin, TX. Remote OK. $120k-$160k. Must know Python, AWS, PostgreSQL."
job = extract_structured(job_text, JobPosting, context="job posting")
print(job.model_dump())
What to Build Next
- Feed the structured output directly into a SQLite or PostgreSQL database
- Build a bulk document processor that extracts JSON from hundreds of files and writes to a spreadsheet
- Add a confidence score field to your schema so Claude tells you how certain it is about each field
Related Reading
- How to Set Up Anthropic Claude with System Prompts - The system prompt foundation this tutorial builds on
- How to Set Up OpenAI Function Calling - OpenAI's native structured output approach for comparison
- How to Handle AI API Rate Limits Gracefully - Production reliability for high-volume extraction pipelines
Want this system built for your business?
Get a free assessment. We will map every system your business needs and show you the ROI.
Get Your Free Assessment