speakturbo-tts

by @emzod in AI & LLM

# Install this skill:

npx skills add EmZod/Speak-Turbo

Or install specific skill: npx add-skill https://github.com/EmZod/Speak-Turbo

# Description

Give your agent the ability to speak to you real-time. Talk to your Claude! Ultra-fast TTS, text-to-speech, voice synthesis, audio output with ~90ms latency. 8 built-in voices for instant voice responses. For voice cloning, use the speak skill.

# SKILL.md

name: speakturbo-tts
description: Give your agent the ability to speak to you real-time. Talk to your Claude! Ultra-fast TTS, text-to-speech, voice synthesis, audio output with ~90ms latency. 8 built-in voices for instant voice responses. For voice cloning, use the speak skill.

speakturbo - Talk to your Claude!

Give your agent the ability to speak to you real-time. Ultra-fast text-to-speech with ~90ms latency and 8 built-in voices.

Quick Start

# Play immediately - you should hear "Hello world" through your speakers
speakturbo "Hello world"
# Output: ⚡ 92ms → ▶ 93ms → ✓ 1245ms

# Verify it's working by saving to file
speakturbo "Hello world" -o test.wav
ls -lh test.wav  # Should show ~50-100KB file

Output explained: ⚡ = first audio received, ▶ = playback started, ✓ = done

First Run

The first execution takes 2-5 seconds while the daemon starts and loads the model into memory. Subsequent calls are ~90ms to first sound.

# First run (slow - daemon starting)
speakturbo "Starting up"  # ~2-5 seconds

# Second run (fast - daemon already running)
speakturbo "Now I'm fast"  # ~90ms

Usage

# Basic - plays immediately (default voice: alba)
speakturbo "Hello world"

# Save to file (no audio playback)
speakturbo "Hello" -o output.wav

# Save to specific file
speakturbo "Goodbye" -o goodbye.wav

# Quiet mode (suppress status messages, still plays audio)
speakturbo "Hello" -q

# List available voices
speakturbo --list-voices

Available Voices

Voice	Type
`alba`	Female (default)
`marius`	Male
`javert`	Male
`jean`	Male
`fantine`	Female
`cosette`	Female
`eponine`	Female
`azelma`	Female

Performance

Metric	Value
Time to first sound	~90ms (daemon warm)
First run	2-5s (daemon startup)
Real-time factor	~4x faster
Sample rate	24kHz mono

Architecture

speakturbo (Rust CLI, 2.2MB)
    │
    │ HTTP streaming (port 7125)
    ▼
speakturbo-daemon (Python + pocket-tts)
    │
    │ Model in memory, auto-shutdown after 1hr idle
    ▼
Audio playback (rodio)

Text Input

Encoding: UTF-8
Quotes in text: Use escaping: speakturbo "She said \"hello\""
Long text: Supported, streams as it generates

Exit Codes

Code	Meaning
0	Success (audio played/saved)
1	Error (daemon connection failed, invalid args)

When to Use

Use speakturbo when:
- You need instant audio feedback (~90ms)
- Speed matters more than voice variety
- Built-in voices are sufficient

Use speak instead when:
- You need custom voice cloning (Morgan Freeman, etc.)
→ speak "text" --voice ~/.chatter/voices/morgan_freeman.wav
- You need emotion tags like [laugh], [sigh]
- Quality/variety matters more than speed

See the speak skill documentation for full usage.

Troubleshooting

No audio plays:

# Check daemon is running
curl http://127.0.0.1:7125/health
# Expected: {"status":"ready","voices":["alba","marius",...]}

# Verify by saving to file and playing manually
speakturbo "test" -o /tmp/test.wav
afplay /tmp/test.wav  # macOS
aplay /tmp/test.wav   # Linux

Daemon won't start:

# Check port availability
lsof -i :7125

# Manually kill and restart
pkill -f "daemon_streaming"
speakturbo "test"  # Auto-restarts daemon

First run is slow:
This is expected. The daemon needs to load the ~100MB model into memory. Subsequent calls will be fast (~90ms).

Daemon Management

The daemon auto-starts on first use and auto-shuts down after 1 hour idle.

# Check status
curl http://127.0.0.1:7125/health

# Manual stop
pkill -f "daemon_streaming"

# View logs
cat /tmp/speakturbo.log

Comparison with speak

Feature	speakturbo	speak
Time to first sound	~90ms	~4-8s
Voice cloning	❌	✅
Emotion tags	❌	✅
Voices	8 built-in	Custom wav files
Engine	pocket-tts	Chatterbox

# README.md

     ███████╗██████╗ ███████╗ █████╗ ██╗  ██╗ ████████╗██╗   ██╗██████╗ ██████╗  ██████╗ 
     ██╔════╝██╔══██╗██╔════╝██╔══██╗██║ ██╔╝ ╚══██╔══╝██║   ██║██╔══██╗██╔══██╗██╔═══██╗
     ███████╗██████╔╝█████╗  ███████║█████╔╝     ██║   ██║   ██║██████╔╝██████╔╝██║   ██║
     ╚════██║██╔═══╝ ██╔══╝  ██╔══██║██╔═██╗     ██║   ██║   ██║██╔══██╗██╔══██╗██║   ██║
     ███████║██║     ███████╗██║  ██║██║  ██╗    ██║   ╚██████╔╝██║  ██║██████╔╝╚██████╔╝
     ╚══════╝╚═╝     ╚══════╝╚═╝  ╚═╝╚═╝  ╚═╝    ╚═╝    ╚═════╝ ╚═╝  ╚═╝╚═════╝  ╚═════╝

Talk to your Claude.

~90ms to first sound. Local. Private. Fast.

speakturbo "Hello world" → ⚡ 92ms → ▶ 93ms → ✓ done

Install

For AI Agents (Claude Code, Cursor, Windsurf):

npx skills add EmZod/Speak-Turbo

CLI only:

pip install pocket-tts uvicorn fastapi
cd speakturbo-cli && cargo build --release

Usage

speakturbo "Hello world"              # Play instantly
speakturbo "Hello" -o out.wav         # Save to file
speakturbo "Hello" -q                 # Quiet mode
speakturbo --list-voices              # Show voices

Voices

alba      ██████████  Female (default)
marius    ██████████  Male
javert    ██████████  Male  
jean      ██████████  Male
fantine   ██████████  Female
cosette   ██████████  Female
eponine   ██████████  Female
azelma    ██████████  Female

Performance

Time to first sound    ░░░░░░░░░░░░░░░░░░░░  ~90ms
First run (cold)       ████░░░░░░░░░░░░░░░░  2-5s  
Real-time factor       ████████████████░░░░  4x faster

Architecture

                    ┌─────────────────┐
                    │   speakturbo    │
                    │   (Rust, 2.2MB) │
                    └────────┬────────┘
                             │ HTTP :7125
                             ▼
                    ┌─────────────────┐
                    │     daemon      │
                    │ (Python + MLX)  │
                    └────────┬────────┘
                             │
                             ▼
                    ┌─────────────────┐
                    │  Audio Output   │
                    │    (rodio)      │
                    └─────────────────┘

Troubleshooting

Problem	Fix
No audio	`curl http://127.0.0.1:7125/health`
Daemon stuck	`pkill -f "daemon_streaming"`
Slow first run	Normal - model loading (2-5s)

# Supported AI Coding Agents

This skill is compatible with the SKILL.md standard and works with all major AI coding agents:

⚡ Amp 🚀 Antigravity 🤖 Claude Code 🦀 Clawdbot 📝 Codex ▶️ Cursor 🤖 Droid 💎 Gemini CLI 🐙 GitHub Copilot 🪿 Goose 📊 Kilo Code 🔧 Kiro CLI 💻 OpenCode 🦘 Roo Code 🌲 Trae 🏄 Windsurf

Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.

speakturbo-tts

# Description

# SKILL.md

speakturbo - Talk to your Claude!

Quick Start

First Run

Usage

Available Voices

Performance

Architecture

Text Input

Exit Codes

When to Use

Troubleshooting

Daemon Management

Comparison with speak

# README.md

Talk to your Claude.

Install

Usage

Voices

Performance

Architecture

Troubleshooting

See Also

# Related Skills

# Supported AI Coding Agents

Confirm

Submit a Skill