How to Optimize Batch AI Processing for Cost
Process large AI workloads at fraction of the cost using batch APIs.
Jay Banlasan
The AI Systems Guy
I had a workflow enriching 2,000 leads every night with AI-generated summaries. Running them synchronously was costing $180/month and taking 45 minutes. After switching to batch ai processing cost reduction methods, the same job costs $90/month and I don't care how long it takes because it runs overnight. Half the cost, zero impact on speed where it matters.
Batch processing is the biggest underused lever in AI operations. Most providers offer 50% discounts on batch API calls because they can schedule them during off-peak hours. The only requirement is that you can tolerate a delay (usually under 24 hours). For nightly enrichment, weekly reporting, and bulk classification tasks, that's almost always fine.
What You Need Before Starting
- Python 3.10+
anthropicSDK installed (pip install anthropic)- A list of tasks to process (at minimum 50+ to make batching worthwhile)
- Tasks that can tolerate up to 24-hour turnaround
Step 1: Identify What Qualifies for Batch Processing
Not everything should go to batch. Use this decision rule:
Real-time required (< 5 seconds) → synchronous API
Background OK (minutes to hours) → async queue workers
Overnight OK (up to 24 hours) → batch API (50% cheaper)
Good batch candidates: lead enrichment, document classification, content generation, email personalization at scale, weekly summaries, SEO meta descriptions, product tag generation.
Bad batch candidates: customer support replies, live tool calls, anything inside a user-facing request loop.
Step 2: Build the Batch Submission Function
Anthropic's Message Batches API accepts up to 10,000 requests in one submission.
import anthropic
import json
from pathlib import Path
client = anthropic.Anthropic()
def submit_batch(tasks: list[dict], batch_name: str) -> str:
"""
tasks = list of dicts with 'id' and 'prompt' keys
Returns batch_id for polling later
"""
requests = [
anthropic.types.message_create_params.MessageCreateParamsNonStreaming(
model="claude-haiku-3",
max_tokens=512,
messages=[{"role": "user", "content": task["prompt"]}]
)
for task in tasks
]
# Build batch requests with custom IDs so you can match results back
batch_requests = []
for task in tasks:
batch_requests.append(
anthropic.types.message_create_params.Request(
custom_id=task["id"],
params=anthropic.types.message_create_params.MessageCreateParamsNonStreaming(
model="claude-haiku-3",
max_tokens=512,
messages=[{"role": "user", "content": task["prompt"]}]
)
)
)
batch = client.messages.batches.create(requests=batch_requests)
# Save batch metadata locally
meta = {"batch_id": batch.id, "name": batch_name, "task_count": len(tasks)}
Path(f"batch_{batch_name}.json").write_text(json.dumps(meta))
print(f"Submitted batch {batch.id} with {len(tasks)} tasks")
return batch.id
Step 3: Build the Lead Enrichment Task Generator
Here's a real example: enriching a list of leads with AI-generated company summaries.
def build_lead_tasks(leads: list[dict]) -> list[dict]:
tasks = []
for lead in leads:
prompt = f"""You are a B2B research assistant.
Summarize this company in 2 sentences for a sales rep.
Focus on: what they do, their likely pain points, and who buys from them.
Company: {lead['company']}
Industry: {lead.get('industry', 'unknown')}
Website: {lead.get('website', 'unknown')}
Employee count: {lead.get('employees', 'unknown')}
Be specific. No filler."""
tasks.append({
"id": f"lead_{lead['id']}",
"prompt": prompt
})
return tasks
# Example usage
leads = [
{"id": "001", "company": "Acme Corp", "industry": "Construction",
"website": "acme.com", "employees": "50"},
# ... up to 10,000
]
tasks = build_lead_tasks(leads)
batch_id = submit_batch(tasks, "lead_enrichment_2024_07_28")
Step 4: Poll for Completion
Batches can complete in minutes or up to 24 hours. Poll every 5 minutes and process when done.
import time
def wait_for_batch(batch_id: str, poll_interval: int = 300) -> list:
"""Polls until batch is complete. Returns list of results."""
print(f"Polling batch {batch_id}...")
while True:
batch = client.messages.batches.retrieve(batch_id)
status = batch.processing_status
if status == "ended":
print(f"Batch complete. Processing results...")
return collect_results(batch_id)
pending = batch.request_counts.processing + batch.request_counts.in_progress
print(f"Status: {status} | Pending: {pending} | "
f"Succeeded: {batch.request_counts.succeeded} | "
f"Errored: {batch.request_counts.errored}")
time.sleep(poll_interval)
def collect_results(batch_id: str) -> list:
results = []
for result in client.messages.batches.results(batch_id):
if result.result.type == "succeeded":
results.append({
"id": result.custom_id,
"text": result.result.message.content[0].text,
"input_tokens": result.result.message.usage.input_tokens,
"output_tokens": result.result.message.usage.output_tokens
})
else:
results.append({
"id": result.custom_id,
"text": None,
"error": result.result.error.type if hasattr(result.result, "error") else "unknown"
})
return results
Step 5: Write Results Back to Your Database
Match results by custom_id back to your original records.
import sqlite3
def save_enrichment_results(results: list, db_path: str = "leads.db"):
conn = sqlite3.connect(db_path)
success = 0
failed = 0
for result in results:
lead_id = result["id"].replace("lead_", "")
if result["text"]:
conn.execute("""
UPDATE leads SET ai_summary = ?, enriched_at = datetime('now')
WHERE id = ?
""", (result["text"], lead_id))
success += 1
else:
conn.execute("""
UPDATE leads SET enrichment_error = ? WHERE id = ?
""", (result.get("error", "failed"), lead_id))
failed += 1
conn.commit()
conn.close()
print(f"Saved: {success} successful, {failed} failed")
Step 6: Schedule as a Nightly Cron Job
Put the whole pipeline in one script and schedule it.
# enrich_leads_batch.py
if __name__ == "__main__":
# 1. Pull unenriched leads from DB
conn = sqlite3.connect("leads.db")
rows = conn.execute(
"SELECT id, company, industry, website, employees FROM leads "
"WHERE ai_summary IS NULL LIMIT 2000"
).fetchall()
conn.close()
leads = [{"id": r[0], "company": r[1], "industry": r[2],
"website": r[3], "employees": r[4]} for r in rows]
if not leads:
print("No leads to enrich. Exiting.")
exit(0)
# 2. Submit batch
tasks = build_lead_tasks(leads)
batch_id = submit_batch(tasks, f"nightly_{date.today().isoformat()}")
# 3. Wait and collect
results = wait_for_batch(batch_id, poll_interval=300)
# 4. Save
save_enrichment_results(results)
# crontab entry — runs at 2am daily
0 2 * * * /usr/bin/python3 /root/scripts/enrich_leads_batch.py >> /var/log/lead_enrichment.log 2>&1
What to Build Next
- Add retry logic for the errored results in each batch run rather than skipping them permanently
- Build a cost comparison report that shows batch vs. synchronous spend month over month
- Chain batch enrichment with a second batch pass for leads that received short or low-quality summaries
Related Reading
- How to Build a Multi-Model AI Router - choose the right model before sending to batch
- How to Build Automatic Model Failover Systems - handle batch submission errors with fallback providers
- How to Build AI Request Throttling Systems - throttle re-submission of failed batch items
Want this system built for your business?
Get a free assessment. We will map every system your business needs and show you the ROI.
Get Your Free Assessment