How to Build Automatic Model Failover Systems
Automatically switch AI providers when your primary model goes down.
Jay Banlasan
The AI Systems Guy
Anthropic had a partial outage on a Tuesday afternoon and my client-facing tool went silent for 20 minutes before I caught it manually. That was the last time I ran a single-provider AI setup. Building ai model failover automatic switching into every production system I run now means outages are invisible to end users. The primary goes down, the backup picks up, and a Slack message tells me about it.
Provider outages are not rare events. They're a predictable cost of running AI in production. If your system has no fallback, your reliability ceiling is whatever the provider's uptime SLA says. Most SLAs are 99.9%, which is 8.7 hours of downtime per year. That's not good enough for customer-facing workflows.
What You Need Before Starting
- Python 3.10+
anthropicandopenaiSDKs installed- API keys for at least two providers
tenacityfor retry logic (pip install tenacity)
Step 1: Define Your Provider Chain
List providers in priority order. Primary runs first. Fallbacks activate in sequence.
import os
PROVIDER_CHAIN = [
{
"name": "anthropic",
"model": "claude-haiku-3",
"api_key": os.getenv("ANTHROPIC_API_KEY"),
"priority": 1
},
{
"name": "openai",
"model": "gpt-4o-mini",
"api_key": os.getenv("OPENAI_API_KEY"),
"priority": 2
},
{
"name": "anthropic-opus",
"model": "claude-opus-4-5",
"api_key": os.getenv("ANTHROPIC_API_KEY"),
"priority": 3
}
]
Notice the third entry uses the same Anthropic key but a different model. If Haiku is down due to a model-specific issue (it happens), Opus may still respond. This isn't just provider failover, it's model failover within the same provider.
Step 2: Build Provider-Specific Call Functions
Each provider gets its own call function that returns a normalized response.
import anthropic
import openai
def call_anthropic(prompt: str, model: str, api_key: str) -> str:
client = anthropic.Anthropic(api_key=api_key)
response = client.messages.create(
model=model,
max_tokens=1024,
messages=[{"role": "user", "content": prompt}]
)
return response.content[0].text
def call_openai(prompt: str, model: str, api_key: str) -> str:
client = openai.OpenAI(api_key=api_key)
response = client.chat.completions.create(
model=model,
messages=[{"role": "user", "content": prompt}],
max_tokens=1024
)
return response.choices[0].message.content
CALL_FUNCTIONS = {
"anthropic": call_anthropic,
"openai": call_openai,
"anthropic-opus": call_anthropic,
}
Step 3: Build the Failover Dispatcher
This is the core function. It walks the provider chain and tries each one. On success it returns immediately. On failure it logs and continues.
import time
import logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("failover")
class AllProvidersFailedError(Exception):
pass
def ai_with_failover(prompt: str, timeout: float = 15.0) -> dict:
errors = []
for provider in PROVIDER_CHAIN:
name = provider["name"]
model = provider["model"]
fn = CALL_FUNCTIONS[name]
try:
start = time.time()
result = fn(prompt, model, provider["api_key"])
latency = time.time() - start
if provider["priority"] > 1:
logger.warning(f"FAILOVER: responded via {name}/{model} "
f"(primary skipped {len(errors)} provider(s))")
return {
"text": result,
"provider": name,
"model": model,
"latency": round(latency, 3),
"failover": provider["priority"] > 1
}
except Exception as e:
logger.error(f"Provider {name}/{model} failed: {type(e).__name__}: {e}")
errors.append({"provider": name, "error": str(e)})
continue
raise AllProvidersFailedError(f"All providers failed: {errors}")
Step 4: Add Circuit Breaker Logic
Retrying a provider that's been down for an hour wastes latency on every request. A circuit breaker skips known-bad providers temporarily.
from collections import defaultdict
from datetime import datetime, timedelta
CIRCUIT_OPEN_DURATION = timedelta(minutes=5)
_circuit_failures: dict[str, list] = defaultdict(list)
_circuit_open_until: dict[str, datetime] = {}
def is_circuit_open(provider_name: str) -> bool:
open_until = _circuit_open_until.get(provider_name)
if open_until and datetime.utcnow() < open_until:
return True
return False
def record_failure(provider_name: str):
_circuit_failures[provider_name].append(datetime.utcnow())
# Remove failures older than 10 minutes
cutoff = datetime.utcnow() - timedelta(minutes=10)
_circuit_failures[provider_name] = [
t for t in _circuit_failures[provider_name] if t > cutoff
]
# Trip circuit after 3 failures in 10 minutes
if len(_circuit_failures[provider_name]) >= 3:
_circuit_open_until[provider_name] = datetime.utcnow() + CIRCUIT_OPEN_DURATION
logger.warning(f"Circuit OPEN for {provider_name} for {CIRCUIT_OPEN_DURATION}")
def record_success(provider_name: str):
_circuit_failures[provider_name] = []
_circuit_open_until.pop(provider_name, None)
Integrate into the dispatcher by adding two checks per provider:
# Inside the for loop in ai_with_failover, before calling fn:
if is_circuit_open(name):
logger.info(f"Skipping {name} — circuit open")
continue
# In the except block, before continue:
record_failure(name)
# In the success return block, before returning:
record_success(name)
Step 5: Add Slack Alerting on Failover
You need to know when your primary is down. Don't wait for a customer to tell you.
import requests, os
SLACK_WEBHOOK = os.getenv("SLACK_WEBHOOK_URL")
def notify_failover(primary: str, active: str, error: str):
if not SLACK_WEBHOOK:
return
msg = (f":rotating_light: AI Failover Activated\n"
f"Primary: `{primary}` is down\n"
f"Active: `{active}` is handling requests\n"
f"Error: {error}")
requests.post(SLACK_WEBHOOK, json={"text": msg}, timeout=5)
Call notify_failover() inside the failover dispatcher when provider["priority"] > 1 on a successful response.
Step 6: Test Your Failover
Never assume it works. Test it before it matters.
# Test by temporarily passing a bad API key for your primary
import os
os.environ["ANTHROPIC_API_KEY"] = "invalid-key-for-testing"
result = ai_with_failover("What is 2 + 2?")
print(result)
# Should return via openai fallback with failover=True
assert result["failover"] == True
print("Failover test passed")
Run this in staging every deploy. A failover system that's never been tested is a false sense of security.
What to Build Next
- Add health check endpoints that ping each provider every 60 seconds and pre-trip circuits before real requests fail
- Log all failover events to your cost dashboard to measure how often each provider gets bypassed
- Build a status page that shows provider health in real time so your team knows the state of the system
Related Reading
- How to Build a Multi-Model AI Router - route by capability before failover handles availability
- How to Optimize Batch AI Processing for Cost - batch processing and failover work together in high-volume pipelines
- How to Build AI Request Throttling Systems - throttle requests to avoid triggering rate limits that look like outages
Want this system built for your business?
Get a free assessment. We will map every system your business needs and show you the ROI.
Get Your Free Assessment