Refactor high-complexity React components in Dify frontend. Use when `pnpm analyze-component...
npx skills add EmZod/Speak-Turbo
Or install specific skill: npx add-skill https://github.com/EmZod/Speak-Turbo
# Description
Give your agent the ability to speak to you real-time. Talk to your Claude! Ultra-fast TTS, text-to-speech, voice synthesis, audio output with ~90ms latency. 8 built-in voices for instant voice responses. For voice cloning, use the speak skill.
# SKILL.md
name: speakturbo-tts
description: Give your agent the ability to speak to you real-time. Talk to your Claude! Ultra-fast TTS, text-to-speech, voice synthesis, audio output with ~90ms latency. 8 built-in voices for instant voice responses. For voice cloning, use the speak skill.
speakturbo - Talk to your Claude!
Give your agent the ability to speak to you real-time. Ultra-fast text-to-speech with ~90ms latency and 8 built-in voices.
Quick Start
# Play immediately - you should hear "Hello world" through your speakers
speakturbo "Hello world"
# Output: β‘ 92ms β βΆ 93ms β β 1245ms
# Verify it's working by saving to file
speakturbo "Hello world" -o test.wav
ls -lh test.wav # Should show ~50-100KB file
Output explained: β‘ = first audio received, βΆ = playback started, β = done
First Run
The first execution takes 2-5 seconds while the daemon starts and loads the model into memory. Subsequent calls are ~90ms to first sound.
# First run (slow - daemon starting)
speakturbo "Starting up" # ~2-5 seconds
# Second run (fast - daemon already running)
speakturbo "Now I'm fast" # ~90ms
Usage
# Basic - plays immediately (default voice: alba)
speakturbo "Hello world"
# Save to file (no audio playback)
speakturbo "Hello" -o output.wav
# Save to specific file
speakturbo "Goodbye" -o goodbye.wav
# Quiet mode (suppress status messages, still plays audio)
speakturbo "Hello" -q
# List available voices
speakturbo --list-voices
Available Voices
| Voice | Type |
|---|---|
alba |
Female (default) |
marius |
Male |
javert |
Male |
jean |
Male |
fantine |
Female |
cosette |
Female |
eponine |
Female |
azelma |
Female |
Performance
| Metric | Value |
|---|---|
| Time to first sound | ~90ms (daemon warm) |
| First run | 2-5s (daemon startup) |
| Real-time factor | ~4x faster |
| Sample rate | 24kHz mono |
Architecture
speakturbo (Rust CLI, 2.2MB)
β
β HTTP streaming (port 7125)
βΌ
speakturbo-daemon (Python + pocket-tts)
β
β Model in memory, auto-shutdown after 1hr idle
βΌ
Audio playback (rodio)
Text Input
- Encoding: UTF-8
- Quotes in text: Use escaping:
speakturbo "She said \"hello\"" - Long text: Supported, streams as it generates
Exit Codes
| Code | Meaning |
|---|---|
| 0 | Success (audio played/saved) |
| 1 | Error (daemon connection failed, invalid args) |
When to Use
Use speakturbo when:
- You need instant audio feedback (~90ms)
- Speed matters more than voice variety
- Built-in voices are sufficient
Use speak instead when:
- You need custom voice cloning (Morgan Freeman, etc.)
β speak "text" --voice ~/.chatter/voices/morgan_freeman.wav
- You need emotion tags like [laugh], [sigh]
- Quality/variety matters more than speed
See the speak skill documentation for full usage.
Troubleshooting
No audio plays:
# Check daemon is running
curl http://127.0.0.1:7125/health
# Expected: {"status":"ready","voices":["alba","marius",...]}
# Verify by saving to file and playing manually
speakturbo "test" -o /tmp/test.wav
afplay /tmp/test.wav # macOS
aplay /tmp/test.wav # Linux
Daemon won't start:
# Check port availability
lsof -i :7125
# Manually kill and restart
pkill -f "daemon_streaming"
speakturbo "test" # Auto-restarts daemon
First run is slow:
This is expected. The daemon needs to load the ~100MB model into memory. Subsequent calls will be fast (~90ms).
Daemon Management
The daemon auto-starts on first use and auto-shuts down after 1 hour idle.
# Check status
curl http://127.0.0.1:7125/health
# Manual stop
pkill -f "daemon_streaming"
# View logs
cat /tmp/speakturbo.log
Comparison with speak
| Feature | speakturbo | speak |
|---|---|---|
| Time to first sound | ~90ms | ~4-8s |
| Voice cloning | β | β |
| Emotion tags | β | β |
| Voices | 8 built-in | Custom wav files |
| Engine | pocket-tts | Chatterbox |
# README.md
βββββββββββββββ ββββββββ ββββββ βββ βββ ββββββββββββ ββββββββββ βββββββ βββββββ
βββββββββββββββββββββββββββββββββββ ββββ ββββββββββββ ββββββββββββββββββββββββββββ
ββββββββββββββββββββββ βββββββββββββββ βββ βββ ββββββββββββββββββββββ βββ
βββββββββββββββ ββββββ βββββββββββββββ βββ βββ ββββββββββββββββββββββ βββ
βββββββββββ βββββββββββ ββββββ βββ βββ ββββββββββββ ββββββββββββββββββββ
βββββββββββ βββββββββββ ββββββ βββ βββ βββββββ βββ ββββββββββ βββββββ
Talk to your Claude.
~90ms to first sound. Local. Private. Fast.
speakturbo "Hello world" β β‘ 92ms β βΆ 93ms β β done
Install
For AI Agents (Claude Code, Cursor, Windsurf):
npx skills add EmZod/Speak-Turbo
CLI only:
pip install pocket-tts uvicorn fastapi
cd speakturbo-cli && cargo build --release
Usage
speakturbo "Hello world" # Play instantly
speakturbo "Hello" -o out.wav # Save to file
speakturbo "Hello" -q # Quiet mode
speakturbo --list-voices # Show voices
Voices
alba ββββββββββ Female (default)
marius ββββββββββ Male
javert ββββββββββ Male
jean ββββββββββ Male
fantine ββββββββββ Female
cosette ββββββββββ Female
eponine ββββββββββ Female
azelma ββββββββββ Female
Performance
Time to first sound ββββββββββββββββββββ ~90ms
First run (cold) ββββββββββββββββββββ 2-5s
Real-time factor ββββββββββββββββββββ 4x faster
Architecture
βββββββββββββββββββ
β speakturbo β
β (Rust, 2.2MB) β
ββββββββββ¬βββββββββ
β HTTP :7125
βΌ
βββββββββββββββββββ
β daemon β
β (Python + MLX) β
ββββββββββ¬βββββββββ
β
βΌ
βββββββββββββββββββ
β Audio Output β
β (rodio) β
βββββββββββββββββββ
Troubleshooting
| Problem | Fix |
|---|---|
| No audio | curl http://127.0.0.1:7125/health |
| Daemon stuck | pkill -f "daemon_streaming" |
| Slow first run | Normal - model loading (2-5s) |
See Also
Need voice cloning? Emotion tags? Try speak.
MIT License Β· Built on Pocket TTS
# Supported AI Coding Agents
This skill is compatible with the SKILL.md standard and works with all major AI coding agents:
Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.