Systems Library / AI Model Setup / How to Set Up LiteLLM as Your AI Gateway
AI Model Setup routing optimization

How to Set Up LiteLLM as Your AI Gateway

Use LiteLLM to access 100+ AI models through a single unified API.

Jay Banlasan

Jay Banlasan

The AI Systems Guy

Managing direct integrations with 5 different AI providers is a maintenance nightmare. Every provider has a different SDK, different authentication, different request format, and different error types. LiteLLM solves this by giving you one OpenAI-compatible API that proxies to any provider you configure. You write your code once against the OpenAI interface and switch models by changing a string.

I use LiteLLM as the gateway layer in every production AI system I build now. It handles fallbacks, load balancing, cost tracking, and caching out of the box. What used to take 400 lines of custom code is now a config file and 20 lines of integration code.

What You Need Before Starting

Step 1: Install LiteLLM

pip install litellm

For the proxy server (recommended for team environments):

pip install "litellm[proxy]"

Step 2: Basic Usage - Call Any Model with One Interface

LiteLLM wraps any provider in the OpenAI message format.

import litellm
import os

# Set your API keys
os.environ["OPENAI_API_KEY"] = "your-openai-key"
os.environ["ANTHROPIC_API_KEY"] = "your-anthropic-key"
os.environ["GEMINI_API_KEY"] = "your-google-key"

def complete(model: str, messages: list, **kwargs) -> str:
    response = litellm.completion(
        model=model,
        messages=messages,
        **kwargs
    )
    return response.choices[0].message.content

# Same code, different models
answer = complete("gpt-4o", [{"role": "user", "content": "What is 2+2?"}])
answer = complete("claude-3-5-sonnet-20241022", [{"role": "user", "content": "What is 2+2?"}])
answer = complete("gemini/gemini-1.5-pro", [{"role": "user", "content": "What is 2+2?"}])
answer = complete("groq/llama-3.1-70b-versatile", [{"role": "user", "content": "What is 2+2?"}])

All four calls use the identical interface. Switching providers is a one-word change.

Step 3: Configure Fallbacks

Define a fallback chain in your completion call.

import litellm

litellm.set_verbose = False

def complete_with_fallback(
    messages: list,
    primary_model: str = "gpt-4o-mini",
    fallback_models: list = None,
    **kwargs
) -> dict:
    if fallback_models is None:
        fallback_models = ["claude-3-haiku-20240307", "gpt-4o"]

    response = litellm.completion(
        model=primary_model,
        messages=messages,
        fallbacks=fallback_models,
        **kwargs
    )

    return {
        "content": response.choices[0].message.content,
        "model_used": response.model,
        "usage": {
            "input_tokens": response.usage.prompt_tokens,
            "output_tokens": response.usage.completion_tokens
        }
    }

result = complete_with_fallback(
    messages=[{"role": "user", "content": "Summarize the concept of compounding interest."}]
)
print(f"Response from: {result['model_used']}")
print(result["content"])

Step 4: Enable Cost Tracking

LiteLLM tracks costs automatically. Read them from the response.

import litellm

litellm.success_callback = []
litellm.failure_callback = []

# Enable cost calculation
litellm.set_verbose = False

def tracked_complete(messages: list, model: str = "gpt-4o-mini") -> dict:
    response = litellm.completion(model=model, messages=messages)

    cost = litellm.completion_cost(completion_response=response)

    return {
        "content": response.choices[0].message.content,
        "model": response.model,
        "cost_usd": round(cost, 6),
        "input_tokens": response.usage.prompt_tokens,
        "output_tokens": response.usage.completion_tokens
    }

result = tracked_complete([{"role": "user", "content": "Write a haiku about APIs."}])
print(f"Cost: ${result['cost_usd']}")

Step 5: Set Up the LiteLLM Proxy Server

For team environments, run LiteLLM as a proxy server that your whole team calls instead of individual providers.

Create litellm_config.yaml:

model_list:
  - model_name: gpt-4o
    litellm_params:
      model: openai/gpt-4o
      api_key: os.environ/OPENAI_API_KEY

  - model_name: gpt-4o-mini
    litellm_params:
      model: openai/gpt-4o-mini
      api_key: os.environ/OPENAI_API_KEY

  - model_name: claude-sonnet
    litellm_params:
      model: anthropic/claude-3-5-sonnet-20241022
      api_key: os.environ/ANTHROPIC_API_KEY

  - model_name: claude-haiku
    litellm_params:
      model: anthropic/claude-3-haiku-20240307
      api_key: os.environ/ANTHROPIC_API_KEY

router_settings:
  routing_strategy: least-busy
  num_retries: 3
  timeout: 30

litellm_settings:
  success_callback: ["langfuse"]  # Optional: observability
  drop_params: true  # Drop unsupported params instead of erroring
  request_timeout: 60

general_settings:
  master_key: your-proxy-master-key  # Required for API auth
  database_url: "sqlite:///./litellm.db"  # For budget tracking

Start the proxy:

litellm --config litellm_config.yaml --port 4000

Step 6: Connect to the Proxy with the OpenAI SDK

Once the proxy is running, point the OpenAI client at it.

from openai import OpenAI

# Point to LiteLLM proxy instead of OpenAI directly
proxy_client = OpenAI(
    api_key="your-proxy-master-key",
    base_url="http://localhost:4000"
)

# Now call any model you configured
response = proxy_client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Hello"}]
)
print(response.choices[0].message.content)

# Switch to Claude with zero code changes
response = proxy_client.chat.completions.create(
    model="claude-sonnet",
    messages=[{"role": "user", "content": "Hello"}]
)
print(response.choices[0].message.content)

Your entire codebase uses the OpenAI SDK. Provider switching is a config file change.

Step 7: Add Budget Controls via the Proxy

Set per-user or per-team spending limits through the proxy API.

import requests

PROXY_URL = "http://localhost:4000"
PROXY_KEY = "your-proxy-master-key"

def create_budget_key(
    user_id: str,
    monthly_budget_usd: float,
    models_allowed: list = None
) -> str:
    """Create an API key with a monthly spending limit."""
    response = requests.post(
        f"{PROXY_URL}/key/generate",
        headers={"Authorization": f"Bearer {PROXY_KEY}"},
        json={
            "user_id": user_id,
            "max_budget": monthly_budget_usd,
            "budget_duration": "monthly",
            "models": models_allowed or ["gpt-4o-mini", "claude-haiku"],
            "metadata": {"created_for": user_id}
        }
    )
    return response.json()["key"]

def get_spend_report() -> dict:
    """Get current spend across all keys."""
    response = requests.get(
        f"{PROXY_URL}/global/spend",
        headers={"Authorization": f"Bearer {PROXY_KEY}"}
    )
    return response.json()

# Create a restricted key for a team member
team_key = create_budget_key("team-member-1", monthly_budget_usd=10.0)
print(f"Team key: {team_key}")

What to Build Next

Related Reading

Want this system built for your business?

Get a free assessment. We will map every system your business needs and show you the ROI.

Get Your Free Assessment

Related Systems