How to Create an AI Text-to-Speech System for Content
Convert written content to natural-sounding audio using AI voices.
Jay Banlasan
The AI Systems Guy
An ai text to speech system for content narration converts blog posts, tutorials, and documentation into listenable audio. I build these for content teams that want to reach audiences who prefer listening over reading. The quality of modern TTS is indistinguishable from human narration for most content.
One blog post becomes a blog post AND a podcast episode with zero extra recording work.
What You Need Before Starting
- Written content to convert (blog posts, articles, scripts)
- Python 3.8+ with the OpenAI SDK or ElevenLabs API
- Storage for audio files
- A hosting solution for audio content
Step 1: Generate Speech with OpenAI
from openai import OpenAI
from pathlib import Path
client = OpenAI()
def text_to_speech(text, output_path, voice="alloy", model="tts-1-hd"):
response = client.audio.speech.create(
model=model,
voice=voice, # alloy, echo, fable, onyx, nova, shimmer
input=text
)
response.stream_to_file(Path(output_path))
return output_path
# Available voices
VOICES = {
"alloy": "Neutral, balanced",
"echo": "Male, warm",
"fable": "Male, British accent",
"onyx": "Male, deep",
"nova": "Female, warm",
"shimmer": "Female, clear"
}
Step 2: Process Long Content
TTS APIs have character limits. Split long content into chunks:
def split_for_tts(text, max_chars=4000):
sentences = text.replace(". ", ".\n").split("\n")
chunks = []
current_chunk = ""
for sentence in sentences:
if len(current_chunk) + len(sentence) > max_chars:
chunks.append(current_chunk.strip())
current_chunk = sentence
else:
current_chunk += " " + sentence
if current_chunk.strip():
chunks.append(current_chunk.strip())
return chunks
def convert_long_text(text, output_path, voice="alloy"):
chunks = split_for_tts(text)
chunk_files = []
for i, chunk in enumerate(chunks):
chunk_path = f"temp/chunk_{i:03d}.mp3"
text_to_speech(chunk, chunk_path, voice=voice)
chunk_files.append(chunk_path)
concatenate_audio(chunk_files, output_path)
return output_path
Step 3: Concatenate Audio Chunks
from pydub import AudioSegment
def concatenate_audio(file_paths, output_path, pause_ms=500):
combined = AudioSegment.empty()
pause = AudioSegment.silent(duration=pause_ms)
for path in file_paths:
audio = AudioSegment.from_mp3(path)
combined += audio + pause
combined.export(output_path, format="mp3")
return output_path
Step 4: Build a Content-to-Audio Pipeline
import os
def convert_blog_post(title, content, output_folder, voice="nova"):
os.makedirs(output_folder, exist_ok=True)
intro = f"This is {title}."
full_text = f"{intro}\n\n{content}"
slug = title.lower().replace(" ", "-")[:50]
output_path = os.path.join(output_folder, f"{slug}.mp3")
convert_long_text(full_text, output_path, voice=voice)
duration = get_audio_duration(output_path)
save_audio_metadata(title, output_path, duration, voice)
return {"path": output_path, "duration_seconds": duration}
Step 5: Batch Convert Content
def batch_convert(articles, output_folder, voice="nova"):
results = []
for article in articles:
try:
result = convert_blog_post(article["title"], article["content"], output_folder, voice)
results.append({"title": article["title"], "status": "success", **result})
except Exception as e:
results.append({"title": article["title"], "status": "failed", "error": str(e)})
return results
What to Build Next
Add automatic podcast feed generation. After converting blog posts to audio, generate an RSS feed that podcast apps can subscribe to. Your blog becomes a podcast with zero additional content creation.
Related Reading
- AI for Content Creation at Scale - TTS as part of content repurposing
- The One Person Company Is Here - one person running a multi-format content operation
- Build vs Buy: The AI Framework - custom TTS pipeline vs podcast recording services
Want this system built for your business?
Get a free assessment. We will map every system your business needs and show you the ROI.
Get Your Free Assessment