How to Build Automatic Model Failover Systems

Automatically switch AI providers when your primary model goes down.

Jay Banlasan

The AI Systems Guy

Anthropic had a partial outage on a Tuesday afternoon and my client-facing tool went silent for 20 minutes before I caught it manually. That was the last time I ran a single-provider AI setup. Building ai model failover automatic switching into every production system I run now means outages are invisible to end users. The primary goes down, the backup picks up, and a Slack message tells me about it.

Provider outages are not rare events. They're a predictable cost of running AI in production. If your system has no fallback, your reliability ceiling is whatever the provider's uptime SLA says. Most SLAs are 99.9%, which is 8.7 hours of downtime per year. That's not good enough for customer-facing workflows.

What You Need Before Starting

Python 3.10+
anthropic and openai SDKs installed
API keys for at least two providers
tenacity for retry logic (pip install tenacity)

Step 1: Define Your Provider Chain

List providers in priority order. Primary runs first. Fallbacks activate in sequence.

import os

PROVIDER_CHAIN = [
    {
        "name": "anthropic",
        "model": "claude-haiku-3",
        "api_key": os.getenv("ANTHROPIC_API_KEY"),
        "priority": 1
    },
    {
        "name": "openai",
        "model": "gpt-4o-mini",
        "api_key": os.getenv("OPENAI_API_KEY"),
        "priority": 2
    },
    {
        "name": "anthropic-opus",
        "model": "claude-opus-4-5",
        "api_key": os.getenv("ANTHROPIC_API_KEY"),
        "priority": 3
    }
]

Notice the third entry uses the same Anthropic key but a different model. If Haiku is down due to a model-specific issue (it happens), Opus may still respond. This isn't just provider failover, it's model failover within the same provider.

Step 2: Build Provider-Specific Call Functions

Each provider gets its own call function that returns a normalized response.

import anthropic
import openai

def call_anthropic(prompt: str, model: str, api_key: str) -> str:
    client = anthropic.Anthropic(api_key=api_key)
    response = client.messages.create(
        model=model,
        max_tokens=1024,
        messages=[{"role": "user", "content": prompt}]
    )
    return response.content[0].text

def call_openai(prompt: str, model: str, api_key: str) -> str:
    client = openai.OpenAI(api_key=api_key)
    response = client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": prompt}],
        max_tokens=1024
    )
    return response.choices[0].message.content

CALL_FUNCTIONS = {
    "anthropic":       call_anthropic,
    "openai":          call_openai,
    "anthropic-opus":  call_anthropic,
}

Step 3: Build the Failover Dispatcher

This is the core function. It walks the provider chain and tries each one. On success it returns immediately. On failure it logs and continues.

import time
import logging

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("failover")

class AllProvidersFailedError(Exception):
    pass

def ai_with_failover(prompt: str, timeout: float = 15.0) -> dict:
    errors = []
    
    for provider in PROVIDER_CHAIN:
        name  = provider["name"]
        model = provider["model"]
        fn    = CALL_FUNCTIONS[name]
        
        try:
            start = time.time()
            result = fn(prompt, model, provider["api_key"])
            latency = time.time() - start
            
            if provider["priority"] > 1:
                logger.warning(f"FAILOVER: responded via {name}/{model} "
                               f"(primary skipped {len(errors)} provider(s))")
            
            return {
                "text":     result,
                "provider": name,
                "model":    model,
                "latency":  round(latency, 3),
                "failover": provider["priority"] > 1
            }
        
        except Exception as e:
            logger.error(f"Provider {name}/{model} failed: {type(e).__name__}: {e}")
            errors.append({"provider": name, "error": str(e)})
            continue
    
    raise AllProvidersFailedError(f"All providers failed: {errors}")

Step 4: Add Circuit Breaker Logic

Retrying a provider that's been down for an hour wastes latency on every request. A circuit breaker skips known-bad providers temporarily.

from collections import defaultdict
from datetime import datetime, timedelta

CIRCUIT_OPEN_DURATION = timedelta(minutes=5)
_circuit_failures: dict[str, list] = defaultdict(list)
_circuit_open_until: dict[str, datetime] = {}

def is_circuit_open(provider_name: str) -> bool:
    open_until = _circuit_open_until.get(provider_name)
    if open_until and datetime.utcnow() < open_until:
        return True
    return False

def record_failure(provider_name: str):
    _circuit_failures[provider_name].append(datetime.utcnow())
    # Remove failures older than 10 minutes
    cutoff = datetime.utcnow() - timedelta(minutes=10)
    _circuit_failures[provider_name] = [
        t for t in _circuit_failures[provider_name] if t > cutoff
    ]
    # Trip circuit after 3 failures in 10 minutes
    if len(_circuit_failures[provider_name]) >= 3:
        _circuit_open_until[provider_name] = datetime.utcnow() + CIRCUIT_OPEN_DURATION
        logger.warning(f"Circuit OPEN for {provider_name} for {CIRCUIT_OPEN_DURATION}")

def record_success(provider_name: str):
    _circuit_failures[provider_name] = []
    _circuit_open_until.pop(provider_name, None)

Integrate into the dispatcher by adding two checks per provider:

# Inside the for loop in ai_with_failover, before calling fn:
if is_circuit_open(name):
    logger.info(f"Skipping {name} — circuit open")
    continue

# In the except block, before continue:
record_failure(name)

# In the success return block, before returning:
record_success(name)

Step 5: Add Slack Alerting on Failover

You need to know when your primary is down. Don't wait for a customer to tell you.

import requests, os

SLACK_WEBHOOK = os.getenv("SLACK_WEBHOOK_URL")

def notify_failover(primary: str, active: str, error: str):
    if not SLACK_WEBHOOK:
        return
    msg = (f":rotating_light: AI Failover Activated\n"
           f"Primary: `{primary}` is down\n"
           f"Active:  `{active}` is handling requests\n"
           f"Error:   {error}")
    requests.post(SLACK_WEBHOOK, json={"text": msg}, timeout=5)

Call notify_failover() inside the failover dispatcher when provider["priority"] > 1 on a successful response.

Step 6: Test Your Failover

Never assume it works. Test it before it matters.

# Test by temporarily passing a bad API key for your primary
import os
os.environ["ANTHROPIC_API_KEY"] = "invalid-key-for-testing"

result = ai_with_failover("What is 2 + 2?")
print(result)
# Should return via openai fallback with failover=True
assert result["failover"] == True
print("Failover test passed")

Run this in staging every deploy. A failover system that's never been tested is a false sense of security.

What to Build Next

Add health check endpoints that ping each provider every 60 seconds and pre-trip circuits before real requests fail
Log all failover events to your cost dashboard to measure how often each provider gets bypassed
Build a status page that shows provider health in real time so your team knows the state of the system

How to Build Automatic Model Failover Systems

What You Need Before Starting

Step 1: Define Your Provider Chain

Step 2: Build Provider-Specific Call Functions

Step 3: Build the Failover Dispatcher

Step 4: Add Circuit Breaker Logic

Step 5: Add Slack Alerting on Failover

Step 6: Test Your Failover

What to Build Next

Related Reading

Related Systems

How to Build a Multi-Model AI Router

How to Optimize Batch AI Processing for Cost

How to Build AI Request Throttling Systems