How to Use Structured Outputs with JSON Schema
Force AI models to return data matching your exact JSON schema.
Jay Banlasan
The AI Systems Guy
Unstructured AI output is a pipeline killer. When you ask a model to return JSON and it wraps it in markdown code fences, adds commentary, or changes field names between calls, your downstream code breaks. Structured outputs with JSON schema validation solves this completely. I use this in every system that moves AI data into a database, a CRM, or another API.
The business case is straightforward: structured outputs let you treat AI like a function with a guaranteed return type. Your code stops doing brittle string parsing and starts doing real logic. That reliability is what separates a toy demo from a production system.
What You Need Before Starting
- Python 3.9+
- OpenAI API key (GPT-4o supports native structured outputs) or Anthropic key
openai>=1.40.0andpydanticpackages
Step 1: Install Dependencies
pip install openai pydantic
Pydantic handles schema generation and validation. OpenAI's response_format parameter enforces the schema at the model level, meaning the API will not return if the output does not conform.
Step 2: Define Your Schema with Pydantic
Define what you want back. This example extracts lead information from a sales call transcript.
from pydantic import BaseModel
from typing import Optional
class LeadExtraction(BaseModel):
name: str
company: str
email: Optional[str]
phone: Optional[str]
pain_point: str
budget_mentioned: bool
next_step: str
urgency_score: int # 1-10
Pydantic auto-generates the JSON schema from this class. You do not write raw JSON schema by hand.
Step 3: Use OpenAI's Native Structured Outputs
OpenAI's parse method enforces the schema server-side. If the model output does not match, you get an error, not garbage data.
from openai import OpenAI
from pydantic import BaseModel
from typing import Optional
client = OpenAI(api_key="YOUR_API_KEY")
class LeadExtraction(BaseModel):
name: str
company: str
email: Optional[str]
phone: Optional[str]
pain_point: str
budget_mentioned: bool
next_step: str
urgency_score: int
transcript = """
Sarah from Acme Corp called asking about our automation package.
She said they're wasting 20 hours a week on manual data entry.
Budget is around 5k/month. She wants to start next quarter.
Reach her at [email protected].
"""
response = client.beta.chat.completions.parse(
model="gpt-4o-2024-08-06",
messages=[
{"role": "system", "content": "Extract lead information from this sales call transcript."},
{"role": "user", "content": transcript}
],
response_format=LeadExtraction
)
lead = response.choices[0].message.parsed
print(lead.name) # Sarah
print(lead.urgency_score) # e.g., 6
print(lead.budget_mentioned) # True
Step 4: Handle Nested Schemas
Real-world extractions often have nested data. Pydantic handles this cleanly.
from pydantic import BaseModel
from typing import List, Optional
class ContactInfo(BaseModel):
email: Optional[str]
phone: Optional[str]
preferred_channel: str
class CompetitorMention(BaseModel):
name: str
sentiment: str # "positive", "negative", "neutral"
class SalesCallAnalysis(BaseModel):
lead_name: str
company: str
contact: ContactInfo
pain_points: List[str]
competitors_mentioned: List[CompetitorMention]
deal_stage: str # "discovery", "proposal", "negotiation", "closed"
follow_up_date: Optional[str]
confidence_score: int # 1-10
response = client.beta.chat.completions.parse(
model="gpt-4o-2024-08-06",
messages=[
{"role": "system", "content": "Analyze this sales call and extract all relevant data."},
{"role": "user", "content": transcript}
],
response_format=SalesCallAnalysis
)
analysis = response.choices[0].message.parsed
for pain in analysis.pain_points:
print(f"- {pain}")
Step 5: Validate with Explicit JSON Schema for Non-OpenAI Models
When you use Anthropic or other providers that do not support native structured outputs, enforce the schema yourself with Pydantic validation after the fact.
import anthropic
import json
from pydantic import BaseModel, ValidationError
client = anthropic.Anthropic(api_key="YOUR_KEY")
class AdConcept(BaseModel):
headline: str
body_copy: str
cta: str
target_audience: str
hook_type: str
def extract_structured(prompt: str, schema_class) -> BaseModel:
schema = schema_class.model_json_schema()
system = f"""You must respond ONLY with valid JSON matching this exact schema.
No markdown. No explanation. Just the JSON object.
Schema:
{json.dumps(schema, indent=2)}"""
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1024,
system=system,
messages=[{"role": "user", "content": prompt}]
)
raw = response.content[0].text.strip()
# Strip markdown if model adds it anyway
if raw.startswith("```"):
raw = raw.split("```")[1]
if raw.startswith("json"):
raw = raw[4:]
try:
data = json.loads(raw)
return schema_class(**data)
except (json.JSONDecodeError, ValidationError) as e:
raise ValueError(f"Model returned invalid schema: {e}\nRaw: {raw}")
concept = extract_structured(
"Generate an ad concept for a B2B SaaS product that saves teams 10 hours a week.",
AdConcept
)
print(concept.headline)
Step 6: Build a Reusable Structured Extractor
Wrap the pattern into a utility function you can drop into any project.
from openai import OpenAI
from pydantic import BaseModel
from typing import Type, TypeVar
T = TypeVar("T", bound=BaseModel)
client = OpenAI(api_key="YOUR_API_KEY")
def extract(
content: str,
schema: Type[T],
system_prompt: str = "Extract the requested information from the provided text.",
model: str = "gpt-4o-2024-08-06"
) -> T:
response = client.beta.chat.completions.parse(
model=model,
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": content}
],
response_format=schema
)
return response.choices[0].message.parsed
# Usage anywhere in your codebase
result = extract(transcript, LeadExtraction)
result = extract(article_text, SalesCallAnalysis)
What to Build Next
- Add enum types to your Pydantic models to constrain field values to a fixed set of allowed options
- Build a structured output validator that runs on a sample of your production extractions weekly to catch model drift
- Chain structured outputs so the output of one model call becomes the validated input to the next step
Related Reading
- How to Write System Prompts That Control AI Behavior - system prompt design matters for reliable schema compliance
- How to Build AI Guardrails for Safe Outputs - layer safety checks on top of structured extraction
- How to Build Persona-Based AI Assistants - persona systems that return structured data for CRM sync
Want this system built for your business?
Get a free assessment. We will map every system your business needs and show you the ROI.
Get Your Free Assessment