Systems Library / AI Model Setup / How to Configure Claude for JSON Output Mode
AI Model Setup foundations

How to Configure Claude for JSON Output Mode

Force Claude to return structured JSON for automated data processing pipelines.

Jay Banlasan

Jay Banlasan

The AI Systems Guy

Getting reliable JSON from Claude is what turns AI from a text generator into a data processing engine. The claude json output mode configuration technique I use combines a strong system prompt with a JSON schema in the user message. When you get this right, you can pipe Claude's output directly into your database, spreadsheet, or downstream API without any parsing gymnastics.

The key insight is that Claude does not have a native "JSON mode" toggle the way OpenAI does. You get reliable JSON by being explicit and specific in your prompts. Combine that with Python's json.loads() and a validation layer, and you have a pipeline that is as reliable as any other data transformation step.

What You Need Before Starting

Step 1: The Basic JSON Extraction Pattern

The core technique: tell Claude exactly what JSON structure to return in the system prompt, then validate the output:

import os
import json
import anthropic
from dotenv import load_dotenv

load_dotenv()
client = anthropic.Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))

def extract_json(text: str, schema_description: str) -> dict:
    """
    Extract structured JSON from unstructured text.
    
    Args:
        text: The raw text to extract from
        schema_description: Description of the JSON structure you want
    
    Returns:
        Parsed dictionary
    """
    system_prompt = f"""You are a data extraction engine. 
Extract information from text and return ONLY valid JSON.
No markdown. No code blocks. No explanations. Raw JSON only.

Required structure:
{schema_description}"""

    response = client.messages.create(
        model="claude-opus-4-5",
        max_tokens=1000,
        system=system_prompt,
        messages=[
            {"role": "user", "content": f"Extract from this text:\n\n{text}"}
        ]
    )
    
    raw = response.content[0].text.strip()
    return json.loads(raw)


# Example: Extract contact information
schema = """{
  "name": "string",
  "company": "string or null",
  "email": "string or null",
  "phone": "string or null"
}"""

text = "Hi Jay, I'm Marcus Webb from Apex Consulting. You can reach me at [email protected] or call 555-0192."
result = extract_json(text, schema)
print(result)
# Output: {"name": "Marcus Webb", "company": "Apex Consulting", "email": "[email protected]", "phone": "555-0192"}

Step 2: Use Pydantic for Schema Validation

Validate the JSON matches your expected structure using Pydantic:

pip install pydantic
from pydantic import BaseModel, ValidationError
from typing import Optional


class ContactInfo(BaseModel):
    name: str
    company: Optional[str] = None
    email: Optional[str] = None
    phone: Optional[str] = None


def extract_contact(text: str) -> ContactInfo:
    """Extract and validate contact information from text."""
    
    schema = """
{
  "name": "full name as string",
  "company": "company name or null",
  "email": "email address or null",
  "phone": "phone number as string or null"
}"""
    
    raw = extract_json(text, schema)
    
    try:
        return ContactInfo(**raw)
    except ValidationError as e:
        print(f"Validation failed: {e}")
        raise


contact = extract_contact("Hi Jay, I'm Marcus Webb from Apex Consulting. Reach me at [email protected].")
print(contact.model_dump())
print(f"Email: {contact.email}")

Step 3: Extract Arrays of Objects

A common pattern: extract multiple items from a document:

from typing import List


class LineItem(BaseModel):
    description: str
    quantity: int
    unit_price: float
    total: float


class Invoice(BaseModel):
    vendor: str
    invoice_number: str
    date: str
    line_items: List[LineItem]
    subtotal: float
    tax: Optional[float] = None
    total_due: float


def parse_invoice(invoice_text: str) -> Invoice:
    """Parse an invoice document into structured data."""
    
    system = """You are an invoice parser. Extract all invoice data and return valid JSON only.
No markdown. No code blocks. Raw JSON only.

Required structure:
{
  "vendor": "company name",
  "invoice_number": "invoice number string",
  "date": "date in YYYY-MM-DD format",
  "line_items": [
    {
      "description": "item description",
      "quantity": integer,
      "unit_price": float,
      "total": float
    }
  ],
  "subtotal": float,
  "tax": float or null,
  "total_due": float
}"""

    response = client.messages.create(
        model="claude-opus-4-5",
        max_tokens=2000,
        system=system,
        messages=[{"role": "user", "content": invoice_text}]
    )
    
    raw = json.loads(response.content[0].text.strip())
    return Invoice(**raw)

Step 4: Handle JSON Parsing Failures Gracefully

Claude occasionally returns JSON with minor formatting issues. Add a recovery layer:

import re


def safe_json_extract(text: str, schema_description: str, retries: int = 2) -> dict:
    """
    Extract JSON with automatic retry on parse failure.
    """
    
    for attempt in range(retries + 1):
        try:
            system_prompt = f"""Return ONLY valid JSON matching this structure:
{schema_description}

Rules:
- No markdown formatting
- No ```json``` code blocks  
- No explanatory text before or after
- Start your response with {{ and end with }}"""

            # On retry, be more explicit
            if attempt > 0:
                system_prompt += "\n\nIMPORTANT: Your last response could not be parsed. Return only raw JSON starting with { and ending with }"

            response = client.messages.create(
                model="claude-opus-4-5",
                max_tokens=1000,
                system=system_prompt,
                messages=[{"role": "user", "content": text}]
            )
            
            raw = response.content[0].text.strip()
            
            # Strip markdown code blocks if present
            raw = re.sub(r'^```(?:json)?\s*', '', raw)
            raw = re.sub(r'\s*```$', '', raw)
            raw = raw.strip()
            
            return json.loads(raw)
            
        except json.JSONDecodeError as e:
            if attempt == retries:
                raise ValueError(f"Failed to parse JSON after {retries + 1} attempts: {e}\nRaw output: {raw}")
            print(f"JSON parse failed on attempt {attempt + 1}, retrying...")
    
    raise ValueError("Should not reach here")

Step 5: Build a Reusable Extraction Pipeline

from typing import Type, TypeVar
from pydantic import BaseModel

T = TypeVar("T", bound=BaseModel)


def extract_structured(
    text: str,
    model_class: Type[T],
    context: str = ""
) -> T:
    """
    Generic structured extraction using a Pydantic model as the schema.
    
    Args:
        text: Text to extract from
        model_class: Pydantic model defining the output schema
        context: Optional context about the document type
    
    Returns:
        Validated instance of model_class
    """
    
    schema = model_class.model_json_schema()
    schema_str = json.dumps(schema, indent=2)
    
    system = f"""You are a data extraction specialist.
{"Context: " + context if context else ""}
Extract information and return ONLY valid JSON matching this schema:

{schema_str}

Rules: Raw JSON only. No markdown. No explanations. Start with {{ end with }}"""

    response = client.messages.create(
        model="claude-opus-4-5",
        max_tokens=2000,
        system=system,
        messages=[{"role": "user", "content": text}]
    )
    
    raw = response.content[0].text.strip()
    raw = re.sub(r'^```(?:json)?\s*', '', raw)
    raw = re.sub(r'\s*```$', '', raw)
    
    data = json.loads(raw.strip())
    return model_class(**data)


# Use it with any Pydantic model
class JobPosting(BaseModel):
    title: str
    company: str
    location: str
    salary_min: Optional[float] = None
    salary_max: Optional[float] = None
    remote: bool
    required_skills: List[str]


job_text = "Senior Python Engineer at TechCorp in Austin, TX. Remote OK. $120k-$160k. Must know Python, AWS, PostgreSQL."
job = extract_structured(job_text, JobPosting, context="job posting")
print(job.model_dump())

What to Build Next

Related Reading

Want this system built for your business?

Get a free assessment. We will map every system your business needs and show you the ROI.

Get Your Free Assessment

Related Systems