Refactor high-complexity React components in Dify frontend. Use when `pnpm analyze-component...
npx skills add machine-machine/openclaw-tts-skill
Or install specific skill: npx add-skill https://github.com/machine-machine/openclaw-tts-skill
# Description
๐ค TTS for OpenClaw agents
# SKILL.md
m2 Voice Skill
Text-to-Speech service for OpenClaw agents.
Quick Start
# Generate speech (file mode)
curl -X POST https://voice.machinemachine.ai/speak/file \
-H "Content-Type: application/json" \
-d '{"text": "Hello, I am m2!", "voice": "default", "format": "mp3"}'
# Returns: {"url": "/files/abc123.mp3", "format": "mp3", "id": "abc123"}
# OpenAI-compatible endpoint
curl -X POST https://voice.machinemachine.ai/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{"text": "Hello world", "voice": "default", "format": "mp3"}' \
--output speech.mp3
Endpoints
| Endpoint | Method | Description |
|---|---|---|
/health |
GET | Health check |
/v1/audio/speech |
POST | OpenAI-compatible TTS |
/speak/file |
POST | Generate file, return URL |
/ws/tts |
WebSocket | Streaming audio |
/files/{id} |
GET | Download generated file |
Request Format
{
"text": "Text to speak",
"voice": "default",
"format": "mp3",
"stream": false
}
Voices
Depends on backend:
- Qwen3-TTS: Expressive, multi-voice (GPU required)
- Piper: Fast, lightweight (CPU-friendly)
Integration with OpenClaw
# In your agent code
import httpx
async def speak(text: str) -> str:
async with httpx.AsyncClient() as client:
resp = await client.post(
"https://voice.machinemachine.ai/speak/file",
json={"text": text, "format": "mp3"}
)
return resp.json()["url"]
WebSocket Streaming
const ws = new WebSocket('wss://voice.machinemachine.ai/ws/tts');
ws.onopen = () => {
ws.send(JSON.stringify({text: "Hello!", voice: "default"}));
};
ws.onmessage = (event) => {
if (event.data instanceof Blob) {
// Audio chunk - append to buffer
audioContext.decodeAudioData(event.data);
} else {
// JSON status message
const status = JSON.parse(event.data);
if (status.status === "done") console.log("Audio complete");
}
};
Deployment
See docker-compose.yml (GPU) or docker-compose.cpu.yml (CPU-only).
Domain: voice.machinemachine.ai
# README.md
๐ค m2 Voice - TTS for OpenClaw
"Finally, I can speak." โ m2
Text-to-Speech service designed for AI agents. OpenAI-compatible API with file output and WebSocket streaming.
Features
- OpenAI-compatible
/v1/audio/speechendpoint - File generation with URL return for async workflows
- WebSocket streaming for real-time audio
- Multiple backends: Qwen3-TTS (GPU) or Piper (CPU)
- Coolify-ready Docker Compose
Quick Deploy (Coolify)
- Create new project in Coolify
- Add Resource โ Docker Compose โ Git repo
- Point to this repo
- Enable GPU if using Qwen3-TTS
- Set domain:
voice.yourdomain.ai
Architecture
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Coolify Proxy โ
โ (TLS termination) โ
โโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ speech-gateway (FastAPI) โ
โ โ
โ โข /v1/audio/speech - OpenAI compatible โ
โ โข /speak/file - File generation โ
โ โข /ws/tts - WebSocket streaming โ
โ โข /files/{id} - Serve generated files โ
โโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ qwen-tts / piper-tts โ
โ (TTS Backend) โ
โ โ
โ GPU: Qwen3-TTS-1.7B (expressive, multi-voice) โ
โ CPU: Piper (fast, lightweight) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Usage
Generate Speech (File)
curl -X POST https://voice.machinemachine.ai/speak/file \
-H "Content-Type: application/json" \
-d '{"text": "Hello from m2!", "format": "mp3"}'
Response:
{"url": "/files/abc123.mp3", "format": "mp3", "id": "abc123"}
OpenAI Compatible
curl -X POST https://voice.machinemachine.ai/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{"text": "Hello world", "voice": "default", "format": "mp3"}' \
--output speech.mp3
WebSocket Streaming
const ws = new WebSocket('wss://voice.machinemachine.ai/ws/tts');
ws.send(JSON.stringify({text: "Hello!", voice: "default"}));
// Receive binary audio chunks
Configuration
GPU Version (Qwen3-TTS)
# Use docker-compose.yml
docker compose up -d
Requires NVIDIA GPU with CUDA 12.1+.
CPU Version (Piper)
# Use docker-compose.cpu.yml
docker compose -f docker-compose.cpu.yml up -d
Works on any machine, faster inference, smaller models.
Environment Variables
| Variable | Default | Description |
|---|---|---|
TTS_BASE_URL |
http://qwen-tts:8000 |
Backend TTS service |
PUBLIC_URL |
`` | Base URL for file links |
STORAGE_DIR |
/app/output |
Where to store audio files |
Integration
Python (OpenClaw agents)
import httpx
async def speak(text: str) -> str:
async with httpx.AsyncClient() as client:
resp = await client.post(
"https://voice.machinemachine.ai/speak/file",
json={"text": text, "format": "mp3"}
)
return resp.json()["url"]
Telegram Bot
Send audio URL directly or download and send as voice message.
License
MIT
Part of the OpenClaw ecosystem.
# Supported AI Coding Agents
This skill is compatible with the SKILL.md standard and works with all major AI coding agents:
Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.