TTS Backends

audia supports three TTS backends, selectable via AUDIA_TTS_BACKEND.

edge-tts (default)

Microsoft Edge’s TTS service, accessed via the edge-tts library.

AUDIA_TTS_BACKEND=edge-tts
AUDIA_TTS_VOICE=en-US-AriaNeural
AUDIA_TTS_RATE=+0%

List all available voices:

edge-tts --list-voices

Popular English voices:

Kokoro is a local neural TTS model — no internet, no API key, GPU optional.

pip install "audia[kokoro]"

AUDIA_TTS_BACKEND=kokoro
AUDIA_TTS_VOICE=af_heart

See the Kokoro documentation for the full voice list.

High-quality TTS via the OpenAI API. Requires an OpenAI API key and incurs cost.

AUDIA_TTS_BACKEND=openai
AUDIA_TTS_VOICE=nova
# AUDIA_OPENAI_API_KEY is required

Available voices: alloy, echo, nova, shimmer, onyx, fable

Long texts are split into chunks before synthesis. The default of 3800 characters works well for edge-tts; OpenAI’s hard limit is 4096 characters.

AUDIA_TTS_CHUNK_CHARS=3800

All chunks are synthesised independently and then concatenated into one final .mp3.