Systems Library / AI Model Setup / How to Use Azure OpenAI Service for Business
AI Model Setup foundations

How to Use Azure OpenAI Service for Business

Set up Azure OpenAI for enterprise-grade GPT access with data residency controls.

Jay Banlasan

Jay Banlasan

The AI Systems Guy

Azure OpenAI service business setup is what you reach for when a client runs on Microsoft 365, has data in Azure, and needs GPT-4 without their data going to OpenAI's commercial API. Azure OpenAI is the same model, different infrastructure: your data stays in your Azure tenant, you get SLA-backed uptime, and you can put it behind a private endpoint. I have deployed this for a legal firm that needed GPT-4 for document review but could not accept OpenAI's standard terms.

The key difference from direct OpenAI API: you deploy a specific model version to your own Azure resource. You own the endpoint. You control who has access via Azure Active Directory. The tradeoff is provisioned throughput: you pay for capacity whether you use it or not, so size this correctly.

What You Need Before Starting

Step 1: Create an Azure OpenAI Resource

# Create a resource group if you don't have one
az group create --name rg-ai-production --location eastus

# Create the Azure OpenAI resource
az cognitiveservices account create \
  --name your-openai-resource \
  --resource-group rg-ai-production \
  --kind OpenAI \
  --sku S0 \
  --location eastus \
  --yes

Get your endpoint and key:

# Get the endpoint
az cognitiveservices account show \
  --name your-openai-resource \
  --resource-group rg-ai-production \
  --query "properties.endpoint" --output tsv

# Get the API key
az cognitiveservices account keys list \
  --name your-openai-resource \
  --resource-group rg-ai-production \
  --query "key1" --output tsv

Store both in your environment as AZURE_OPENAI_ENDPOINT and AZURE_OPENAI_KEY.

Step 2: Deploy a Model

Azure OpenAI requires you to explicitly deploy model versions to your resource. Go to Azure AI Studio (oai.azure.com), navigate to your resource, and click "Deployments", then "Create new deployment."

Or do it via CLI:

az cognitiveservices account deployment create \
  --name your-openai-resource \
  --resource-group rg-ai-production \
  --deployment-name gpt-4-production \
  --model-name gpt-4 \
  --model-version "turbo-2024-04-09" \
  --model-format OpenAI \
  --sku-capacity 10 \
  --sku-name "Standard"

The --deployment-name is what you reference in API calls. Naming it descriptively (gpt-4-production, gpt-35-drafts) makes it easier when you have multiple deployments.

Step 3: Make Your First API Call

The openai Python library handles Azure with an AzureOpenAI client. Point it at your resource.

import os
from openai import AzureOpenAI

client = AzureOpenAI(
    api_key=os.environ["AZURE_OPENAI_KEY"],
    api_version="2024-02-15-preview",
    azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"]
)

def chat_azure(
    user_message: str,
    deployment_name: str = "gpt-4-production",
    system_prompt: str = "You are a helpful business assistant.",
    max_tokens: int = 1024
) -> str:
    response = client.chat.completions.create(
        model=deployment_name,  # This is your deployment name, not the model name
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_message}
        ],
        max_tokens=max_tokens,
        temperature=0.7
    )
    return response.choices[0].message.content

# Test
result = chat_azure("Summarize the key benefits of data residency controls in two sentences.")
print(result)

The model parameter in the API call takes your deployment name, not "gpt-4". This is the most common mistake. If you pass "gpt-4", you get a 404.

Step 4: Add Managed Identity Authentication (Production)

API keys work but Managed Identity is more secure for production. It removes hardcoded credentials.

from azure.identity import DefaultAzureCredential, get_bearer_token_provider
from openai import AzureOpenAI

# Uses the Azure identity chain (managed identity in Azure, CLI locally)
token_provider = get_bearer_token_provider(
    DefaultAzureCredential(),
    "https://cognitiveservices.azure.com/.default"
)

client = AzureOpenAI(
    azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
    azure_ad_token_provider=token_provider,
    api_version="2024-02-15-preview"
)

Install the identity library: pip install azure-identity. When running in Azure (App Service, AKS, Azure Functions), this picks up the managed identity automatically. Locally it uses your az login session.

Step 5: Handle Rate Limits and Throttling

Azure OpenAI throttles based on your provisioned capacity in tokens per minute (TPM). Build retry logic:

import time
from openai import AzureOpenAI, RateLimitError, APIStatusError

def chat_with_retry(
    client: AzureOpenAI,
    messages: list,
    deployment_name: str,
    max_retries: int = 3
) -> str:
    for attempt in range(max_retries):
        try:
            response = client.chat.completions.create(
                model=deployment_name,
                messages=messages,
                max_tokens=1024
            )
            return response.choices[0].message.content

        except RateLimitError as e:
            if attempt == max_retries - 1:
                raise
            wait_time = 2 ** attempt  # 1s, 2s, 4s
            print(f"Rate limited. Waiting {wait_time}s...")
            time.sleep(wait_time)

        except APIStatusError as e:
            if e.status_code == 429:
                retry_after = int(e.response.headers.get("Retry-After", 5))
                print(f"Throttled. Waiting {retry_after}s...")
                time.sleep(retry_after)
            else:
                raise

Step 6: Set Up Usage Monitoring

Azure OpenAI usage shows up in Azure Monitor. Pull it programmatically:

from azure.monitor.query import MetricsQueryClient
from azure.identity import DefaultAzureCredential
from datetime import datetime, timedelta

def get_token_usage(resource_id: str, hours: int = 24):
    credential = DefaultAzureCredential()
    client = MetricsQueryClient(credential)

    end_time = datetime.utcnow()
    start_time = end_time - timedelta(hours=hours)

    result = client.query_resource(
        resource_uri=resource_id,
        metric_names=["TokenTransaction"],
        timespan=(start_time, end_time),
        granularity=timedelta(hours=1)
    )

    for metric in result.metrics:
        print(f"Metric: {metric.name}")
        for ts in metric.timeseries:
            for dp in ts.data:
                if dp.total:
                    print(f"  {dp.timestamp}: {int(dp.total)} tokens")

The resource_id is the full ARM resource ID for your Cognitive Services account. Find it in the Azure portal under Properties.

What to Build Next

Related Reading

Want this system built for your business?

Get a free assessment. We will map every system your business needs and show you the ROI.

Get Your Free Assessment

Related Systems