Systems Library / AI Model Setup / How to Use Structured Outputs with JSON Schema

AI Model Setup advanced

How to Use Structured Outputs with JSON Schema

Force AI models to return data matching your exact JSON schema.

Jay Banlasan

The AI Systems Guy

Unstructured AI output is a pipeline killer. When you ask a model to return JSON and it wraps it in markdown code fences, adds commentary, or changes field names between calls, your downstream code breaks. Structured outputs with JSON schema validation solves this completely. I use this in every system that moves AI data into a database, a CRM, or another API.

The business case is straightforward: structured outputs let you treat AI like a function with a guaranteed return type. Your code stops doing brittle string parsing and starts doing real logic. That reliability is what separates a toy demo from a production system.

What You Need Before Starting

Python 3.9+
OpenAI API key (GPT-4o supports native structured outputs) or Anthropic key
openai>=1.40.0 and pydantic packages

Step 1: Install Dependencies

pip install openai pydantic

Pydantic handles schema generation and validation. OpenAI's response_format parameter enforces the schema at the model level, meaning the API will not return if the output does not conform.

Step 2: Define Your Schema with Pydantic

Define what you want back. This example extracts lead information from a sales call transcript.

from pydantic import BaseModel
from typing import Optional

class LeadExtraction(BaseModel):
    name: str
    company: str
    email: Optional[str]
    phone: Optional[str]
    pain_point: str
    budget_mentioned: bool
    next_step: str
    urgency_score: int  # 1-10

Pydantic auto-generates the JSON schema from this class. You do not write raw JSON schema by hand.

Step 3: Use OpenAI's Native Structured Outputs

OpenAI's parse method enforces the schema server-side. If the model output does not match, you get an error, not garbage data.

from openai import OpenAI
from pydantic import BaseModel
from typing import Optional

client = OpenAI(api_key="YOUR_API_KEY")

class LeadExtraction(BaseModel):
    name: str
    company: str
    email: Optional[str]
    phone: Optional[str]
    pain_point: str
    budget_mentioned: bool
    next_step: str
    urgency_score: int

transcript = """
Sarah from Acme Corp called asking about our automation package. 
She said they're wasting 20 hours a week on manual data entry. 
Budget is around 5k/month. She wants to start next quarter. 
Reach her at [email protected].
"""

response = client.beta.chat.completions.parse(
    model="gpt-4o-2024-08-06",
    messages=[
        {"role": "system", "content": "Extract lead information from this sales call transcript."},
        {"role": "user", "content": transcript}
    ],
    response_format=LeadExtraction
)

lead = response.choices[0].message.parsed
print(lead.name)           # Sarah
print(lead.urgency_score)  # e.g., 6
print(lead.budget_mentioned)  # True

Step 4: Handle Nested Schemas

Real-world extractions often have nested data. Pydantic handles this cleanly.

from pydantic import BaseModel
from typing import List, Optional

class ContactInfo(BaseModel):
    email: Optional[str]
    phone: Optional[str]
    preferred_channel: str

class CompetitorMention(BaseModel):
    name: str
    sentiment: str  # "positive", "negative", "neutral"

class SalesCallAnalysis(BaseModel):
    lead_name: str
    company: str
    contact: ContactInfo
    pain_points: List[str]
    competitors_mentioned: List[CompetitorMention]
    deal_stage: str  # "discovery", "proposal", "negotiation", "closed"
    follow_up_date: Optional[str]
    confidence_score: int  # 1-10

response = client.beta.chat.completions.parse(
    model="gpt-4o-2024-08-06",
    messages=[
        {"role": "system", "content": "Analyze this sales call and extract all relevant data."},
        {"role": "user", "content": transcript}
    ],
    response_format=SalesCallAnalysis
)

analysis = response.choices[0].message.parsed
for pain in analysis.pain_points:
    print(f"- {pain}")

Step 5: Validate with Explicit JSON Schema for Non-OpenAI Models

When you use Anthropic or other providers that do not support native structured outputs, enforce the schema yourself with Pydantic validation after the fact.

import anthropic
import json
from pydantic import BaseModel, ValidationError

client = anthropic.Anthropic(api_key="YOUR_KEY")

class AdConcept(BaseModel):
    headline: str
    body_copy: str
    cta: str
    target_audience: str
    hook_type: str

def extract_structured(prompt: str, schema_class) -> BaseModel:
    schema = schema_class.model_json_schema()

    system = f"""You must respond ONLY with valid JSON matching this exact schema. 
No markdown. No explanation. Just the JSON object.

Schema:
{json.dumps(schema, indent=2)}"""

    response = client.messages.create(
        model="claude-3-5-sonnet-20241022",
        max_tokens=1024,
        system=system,
        messages=[{"role": "user", "content": prompt}]
    )

    raw = response.content[0].text.strip()

    # Strip markdown if model adds it anyway
    if raw.startswith("```"):
        raw = raw.split("```")[1]
        if raw.startswith("json"):
            raw = raw[4:]

    try:
        data = json.loads(raw)
        return schema_class(**data)
    except (json.JSONDecodeError, ValidationError) as e:
        raise ValueError(f"Model returned invalid schema: {e}\nRaw: {raw}")

concept = extract_structured(
    "Generate an ad concept for a B2B SaaS product that saves teams 10 hours a week.",
    AdConcept
)
print(concept.headline)

Step 6: Build a Reusable Structured Extractor

Wrap the pattern into a utility function you can drop into any project.

from openai import OpenAI
from pydantic import BaseModel
from typing import Type, TypeVar

T = TypeVar("T", bound=BaseModel)
client = OpenAI(api_key="YOUR_API_KEY")

def extract(
    content: str,
    schema: Type[T],
    system_prompt: str = "Extract the requested information from the provided text.",
    model: str = "gpt-4o-2024-08-06"
) -> T:
    response = client.beta.chat.completions.parse(
        model=model,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": content}
        ],
        response_format=schema
    )
    return response.choices[0].message.parsed

# Usage anywhere in your codebase
result = extract(transcript, LeadExtraction)
result = extract(article_text, SalesCallAnalysis)

What to Build Next

Add enum types to your Pydantic models to constrain field values to a fixed set of allowed options
Build a structured output validator that runs on a sample of your production extractions weekly to catch model drift
Chain structured outputs so the output of one model call becomes the validated input to the next step

How to Use Structured Outputs with JSON Schema

What You Need Before Starting

Step 1: Install Dependencies

Step 2: Define Your Schema with Pydantic

Step 3: Use OpenAI's Native Structured Outputs

Step 4: Handle Nested Schemas

Step 5: Validate with Explicit JSON Schema for Non-OpenAI Models

Step 6: Build a Reusable Structured Extractor

What to Build Next

Related Reading

Related Systems

How to Write System Prompts That Control AI Behavior

How to Build AI Guardrails for Safe Outputs

How to Build Persona-Based AI Assistants