Systems Library / AI Capabilities / How to Create an AI Text-to-Speech System for Content

AI Capabilities voice audio

How to Create an AI Text-to-Speech System for Content

Convert written content to natural-sounding audio using AI voices.

Jay Banlasan

The AI Systems Guy

An ai text to speech system for content narration converts blog posts, tutorials, and documentation into listenable audio. I build these for content teams that want to reach audiences who prefer listening over reading. The quality of modern TTS is indistinguishable from human narration for most content.

One blog post becomes a blog post AND a podcast episode with zero extra recording work.

What You Need Before Starting

Written content to convert (blog posts, articles, scripts)
Python 3.8+ with the OpenAI SDK or ElevenLabs API
Storage for audio files
A hosting solution for audio content

Step 1: Generate Speech with OpenAI

from openai import OpenAI
from pathlib import Path

client = OpenAI()

def text_to_speech(text, output_path, voice="alloy", model="tts-1-hd"):
    response = client.audio.speech.create(
        model=model,
        voice=voice,  # alloy, echo, fable, onyx, nova, shimmer
        input=text
    )
    response.stream_to_file(Path(output_path))
    return output_path

# Available voices
VOICES = {
    "alloy": "Neutral, balanced",
    "echo": "Male, warm",
    "fable": "Male, British accent",
    "onyx": "Male, deep",
    "nova": "Female, warm",
    "shimmer": "Female, clear"
}

Step 2: Process Long Content

TTS APIs have character limits. Split long content into chunks:

def split_for_tts(text, max_chars=4000):
    sentences = text.replace(". ", ".\n").split("\n")
    chunks = []
    current_chunk = ""

    for sentence in sentences:
        if len(current_chunk) + len(sentence) > max_chars:
            chunks.append(current_chunk.strip())
            current_chunk = sentence
        else:
            current_chunk += " " + sentence

    if current_chunk.strip():
        chunks.append(current_chunk.strip())

    return chunks

def convert_long_text(text, output_path, voice="alloy"):
    chunks = split_for_tts(text)
    chunk_files = []

    for i, chunk in enumerate(chunks):
        chunk_path = f"temp/chunk_{i:03d}.mp3"
        text_to_speech(chunk, chunk_path, voice=voice)
        chunk_files.append(chunk_path)

    concatenate_audio(chunk_files, output_path)
    return output_path

Step 3: Concatenate Audio Chunks

from pydub import AudioSegment

def concatenate_audio(file_paths, output_path, pause_ms=500):
    combined = AudioSegment.empty()
    pause = AudioSegment.silent(duration=pause_ms)

    for path in file_paths:
        audio = AudioSegment.from_mp3(path)
        combined += audio + pause

    combined.export(output_path, format="mp3")
    return output_path

Step 4: Build a Content-to-Audio Pipeline

import os

def convert_blog_post(title, content, output_folder, voice="nova"):
    os.makedirs(output_folder, exist_ok=True)

    intro = f"This is {title}."
    full_text = f"{intro}\n\n{content}"

    slug = title.lower().replace(" ", "-")[:50]
    output_path = os.path.join(output_folder, f"{slug}.mp3")

    convert_long_text(full_text, output_path, voice=voice)

    duration = get_audio_duration(output_path)
    save_audio_metadata(title, output_path, duration, voice)

    return {"path": output_path, "duration_seconds": duration}

Step 5: Batch Convert Content

def batch_convert(articles, output_folder, voice="nova"):
    results = []
    for article in articles:
        try:
            result = convert_blog_post(article["title"], article["content"], output_folder, voice)
            results.append({"title": article["title"], "status": "success", **result})
        except Exception as e:
            results.append({"title": article["title"], "status": "failed", "error": str(e)})
    return results

What to Build Next

Add automatic podcast feed generation. After converting blog posts to audio, generate an RSS feed that podcast apps can subscribe to. Your blog becomes a podcast with zero additional content creation.

How to Create an AI Text-to-Speech System for Content

What You Need Before Starting

Step 1: Generate Speech with OpenAI

Step 2: Process Long Content

Step 3: Concatenate Audio Chunks

Step 4: Build a Content-to-Audio Pipeline

Step 5: Batch Convert Content

What to Build Next

Related Reading

Related Systems

How to Build an AI Voice Transcription System

How to Build an AI Call Analysis System

How to Create an AI Meeting Recording Analyzer