Systems Library / AI Capabilities / How to Build a Voice-Activated AI Assistant for Business
AI Capabilities voice audio

How to Build a Voice-Activated AI Assistant for Business

Create a voice-controlled AI assistant for hands-free business operations.

Jay Banlasan

Jay Banlasan

The AI Systems Guy

A voice activated ai assistant for business operations lets you query data, create tasks, and get briefings hands-free. I build these for operators who need quick answers while multitasking. "What is our lead count this week?" or "Schedule a follow-up with the Johnson account" spoken aloud and handled instantly.

This combines speech recognition, AI processing, and text-to-speech into one loop.

What You Need Before Starting

Step 1: Build the Voice Input Loop

import speech_recognition as sr

recognizer = sr.Recognizer()

def listen_for_command():
    with sr.Microphone() as source:
        recognizer.adjust_for_ambient_noise(source, duration=1)
        print("Listening...")
        audio = recognizer.listen(source, timeout=10)

    try:
        text = recognizer.recognize_whisper(audio, model="base")
        return text
    except sr.UnknownValueError:
        return None
    except sr.RequestError as e:
        print(f"Error: {e}")
        return None

Step 2: Process Commands with AI

import anthropic
import json

client = anthropic.Anthropic()

ASSISTANT_PROMPT = """You are a voice-controlled business assistant.
Available commands:
- Query data (leads, revenue, tasks, calendar)
- Create tasks and reminders
- Get briefings and summaries
- Look up client information

Respond concisely. Spoken responses should be under 3 sentences.
If you need to take an action, respond with JSON: {"action": "...", "params": {...}}
If you are answering a question, respond with plain text."""

def process_command(command_text, context=None):
    messages = [{"role": "user", "content": command_text}]
    if context:
        messages.insert(0, {"role": "assistant", "content": context})

    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=300,
        system=ASSISTANT_PROMPT,
        messages=messages
    )
    return response.content[0].text

Step 3: Execute Actions

ACTION_HANDLERS = {
    "query_leads": lambda p: query_database("SELECT COUNT(*) FROM leads WHERE created_at > ?", [p.get("since", "today")]),
    "create_task": lambda p: create_task(p["title"], p.get("due", None)),
    "get_briefing": lambda p: generate_daily_briefing(),
    "lookup_client": lambda p: lookup_client(p["name"]),
}

def handle_response(response_text):
    try:
        action_data = json.loads(response_text)
        if "action" in action_data:
            handler = ACTION_HANDLERS.get(action_data["action"])
            if handler:
                result = handler(action_data.get("params", {}))
                return str(result)
    except json.JSONDecodeError:
        pass
    return response_text

Step 4: Add Text-to-Speech Output

from openai import OpenAI

tts_client = OpenAI()

def speak(text):
    response = tts_client.audio.speech.create(
        model="tts-1",
        voice="alloy",
        input=text
    )
    response.stream_to_file("response.mp3")
    play_audio("response.mp3")

def play_audio(path):
    import subprocess
    subprocess.run(["ffplay", "-nodisp", "-autoexit", path], capture_output=True)

Step 5: Run the Assistant Loop

def run_assistant():
    print("Voice assistant ready. Say 'exit' to stop.")
    context = None

    while True:
        command = listen_for_command()
        if command is None:
            continue

        print(f"You said: {command}")

        if "exit" in command.lower() or "stop" in command.lower():
            speak("Goodbye.")
            break

        response = process_command(command, context)
        result = handle_response(response)
        print(f"Assistant: {result}")
        speak(result)
        context = result

if __name__ == "__main__":
    run_assistant()

What to Build Next

Add wake word detection so the assistant only listens when addressed. "Hey Operations" triggers listening mode. This prevents accidental activations and makes the assistant feel more natural to use alongside normal conversation.

Related Reading

Want this system built for your business?

Get a free assessment. We will map every system your business needs and show you the ROI.

Get Your Free Assessment

Related Systems