Refactor high-complexity React components in Dify frontend. Use when `pnpm analyze-component...
npx skills add EmZod/Speak-Turbo
Or install specific skill: npx add-skill https://github.com/EmZod/Speak-Turbo
# Description
Give your agent the ability to speak to you real-time. Talk to your Claude! Ultra-fast TTS, text-to-speech, voice synthesis, audio output with ~90ms latency. 8 built-in voices for instant voice responses. For voice cloning, use the speak skill.
# SKILL.md
name: speakturbo-tts
description: Give your agent the ability to speak to you real-time. Talk to your Claude! Ultra-fast TTS, text-to-speech, voice synthesis, audio output with ~90ms latency. 8 built-in voices for instant voice responses. For voice cloning, use the speak skill.
speakturbo - Talk to your Claude!
Give your agent the ability to speak to you real-time. Ultra-fast text-to-speech with ~90ms latency and 8 built-in voices.
Quick Start
# Play immediately - you should hear "Hello world" through your speakers
speakturbo "Hello world"
# Output: ⚡ 92ms → ▶ 93ms → ✓ 1245ms
# Verify it's working by saving to file
speakturbo "Hello world" -o test.wav
ls -lh test.wav # Should show ~50-100KB file
Output explained: ⚡ = first audio received, ▶ = playback started, ✓ = done
First Run
The first execution takes 2-5 seconds while the daemon starts and loads the model into memory. Subsequent calls are ~90ms to first sound.
# First run (slow - daemon starting)
speakturbo "Starting up" # ~2-5 seconds
# Second run (fast - daemon already running)
speakturbo "Now I'm fast" # ~90ms
Usage
# Basic - plays immediately (default voice: alba)
speakturbo "Hello world"
# Save to file (no audio playback)
speakturbo "Hello" -o output.wav
# Save to specific file
speakturbo "Goodbye" -o goodbye.wav
# Quiet mode (suppress status messages, still plays audio)
speakturbo "Hello" -q
# List available voices
speakturbo --list-voices
Available Voices
| Voice | Type |
|---|---|
alba |
Female (default) |
marius |
Male |
javert |
Male |
jean |
Male |
fantine |
Female |
cosette |
Female |
eponine |
Female |
azelma |
Female |
Performance
| Metric | Value |
|---|---|
| Time to first sound | ~90ms (daemon warm) |
| First run | 2-5s (daemon startup) |
| Real-time factor | ~4x faster |
| Sample rate | 24kHz mono |
Architecture
speakturbo (Rust CLI, 2.2MB)
│
│ HTTP streaming (port 7125)
▼
speakturbo-daemon (Python + pocket-tts)
│
│ Model in memory, auto-shutdown after 1hr idle
▼
Audio playback (rodio)
Text Input
- Encoding: UTF-8
- Quotes in text: Use escaping:
speakturbo "She said \"hello\"" - Long text: Supported, streams as it generates
Exit Codes
| Code | Meaning |
|---|---|
| 0 | Success (audio played/saved) |
| 1 | Error (daemon connection failed, invalid args) |
When to Use
Use speakturbo when:
- You need instant audio feedback (~90ms)
- Speed matters more than voice variety
- Built-in voices are sufficient
Use speak instead when:
- You need custom voice cloning (Morgan Freeman, etc.)
→ speak "text" --voice ~/.chatter/voices/morgan_freeman.wav
- You need emotion tags like [laugh], [sigh]
- Quality/variety matters more than speed
See the speak skill documentation for full usage.
Troubleshooting
No audio plays:
# Check daemon is running
curl http://127.0.0.1:7125/health
# Expected: {"status":"ready","voices":["alba","marius",...]}
# Verify by saving to file and playing manually
speakturbo "test" -o /tmp/test.wav
afplay /tmp/test.wav # macOS
aplay /tmp/test.wav # Linux
Daemon won't start:
# Check port availability
lsof -i :7125
# Manually kill and restart
pkill -f "daemon_streaming"
speakturbo "test" # Auto-restarts daemon
First run is slow:
This is expected. The daemon needs to load the ~100MB model into memory. Subsequent calls will be fast (~90ms).
Daemon Management
The daemon auto-starts on first use and auto-shuts down after 1 hour idle.
# Check status
curl http://127.0.0.1:7125/health
# Manual stop
pkill -f "daemon_streaming"
# View logs
cat /tmp/speakturbo.log
Comparison with speak
| Feature | speakturbo | speak |
|---|---|---|
| Time to first sound | ~90ms | ~4-8s |
| Voice cloning | ❌ | ✅ |
| Emotion tags | ❌ | ✅ |
| Voices | 8 built-in | Custom wav files |
| Engine | pocket-tts | Chatterbox |
# README.md
███████╗██████╗ ███████╗ █████╗ ██╗ ██╗ ████████╗██╗ ██╗██████╗ ██████╗ ██████╗
██╔════╝██╔══██╗██╔════╝██╔══██╗██║ ██╔╝ ╚══██╔══╝██║ ██║██╔══██╗██╔══██╗██╔═══██╗
███████╗██████╔╝█████╗ ███████║█████╔╝ ██║ ██║ ██║██████╔╝██████╔╝██║ ██║
╚════██║██╔═══╝ ██╔══╝ ██╔══██║██╔═██╗ ██║ ██║ ██║██╔══██╗██╔══██╗██║ ██║
███████║██║ ███████╗██║ ██║██║ ██╗ ██║ ╚██████╔╝██║ ██║██████╔╝╚██████╔╝
╚══════╝╚═╝ ╚══════╝╚═╝ ╚═╝╚═╝ ╚═╝ ╚═╝ ╚═════╝ ╚═╝ ╚═╝╚═════╝ ╚═════╝
Talk to your Claude.
~90ms to first sound. Local. Private. Fast.
speakturbo "Hello world" → ⚡ 92ms → ▶ 93ms → ✓ done
Install
For AI Agents (Claude Code, Cursor, Windsurf):
npx skills add EmZod/Speak-Turbo
CLI only:
pip install pocket-tts uvicorn fastapi
cd speakturbo-cli && cargo build --release
Usage
speakturbo "Hello world" # Play instantly
speakturbo "Hello" -o out.wav # Save to file
speakturbo "Hello" -q # Quiet mode
speakturbo --list-voices # Show voices
Voices
alba ██████████ Female (default)
marius ██████████ Male
javert ██████████ Male
jean ██████████ Male
fantine ██████████ Female
cosette ██████████ Female
eponine ██████████ Female
azelma ██████████ Female
Performance
Time to first sound ░░░░░░░░░░░░░░░░░░░░ ~90ms
First run (cold) ████░░░░░░░░░░░░░░░░ 2-5s
Real-time factor ████████████████░░░░ 4x faster
Architecture
┌─────────────────┐
│ speakturbo │
│ (Rust, 2.2MB) │
└────────┬────────┘
│ HTTP :7125
▼
┌─────────────────┐
│ daemon │
│ (Python + MLX) │
└────────┬────────┘
│
▼
┌─────────────────┐
│ Audio Output │
│ (rodio) │
└─────────────────┘
Troubleshooting
| Problem | Fix |
|---|---|
| No audio | curl http://127.0.0.1:7125/health |
| Daemon stuck | pkill -f "daemon_streaming" |
| Slow first run | Normal - model loading (2-5s) |
See Also
Need voice cloning? Emotion tags? Try speak.
MIT License · Built on Pocket TTS
# Supported AI Coding Agents
This skill is compatible with the SKILL.md standard and works with all major AI coding agents:
Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.