In your agent code

by @machine-machine in AI & LLM

# Install this skill:

npx skills add machine-machine/openclaw-tts-skill

Or install specific skill: npx add-skill https://github.com/machine-machine/openclaw-tts-skill

# Description

🎤 TTS for OpenClaw agents

# SKILL.md

m2 Voice Skill

Text-to-Speech service for OpenClaw agents.

Quick Start

# Generate speech (file mode)
curl -X POST https://voice.machinemachine.ai/speak/file \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello, I am m2!", "voice": "default", "format": "mp3"}'
# Returns: {"url": "/files/abc123.mp3", "format": "mp3", "id": "abc123"}

# OpenAI-compatible endpoint
curl -X POST https://voice.machinemachine.ai/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello world", "voice": "default", "format": "mp3"}' \
  --output speech.mp3

Endpoints

Endpoint	Method	Description
`/health`	GET	Health check
`/v1/audio/speech`	POST	OpenAI-compatible TTS
`/speak/file`	POST	Generate file, return URL
`/ws/tts`	WebSocket	Streaming audio
`/files/{id}`	GET	Download generated file

Request Format

{
  "text": "Text to speak",
  "voice": "default",
  "format": "mp3",
  "stream": false
}

Voices

Depends on backend:
- Qwen3-TTS: Expressive, multi-voice (GPU required)
- Piper: Fast, lightweight (CPU-friendly)

Integration with OpenClaw

# In your agent code
import httpx

async def speak(text: str) -> str:
    async with httpx.AsyncClient() as client:
        resp = await client.post(
            "https://voice.machinemachine.ai/speak/file",
            json={"text": text, "format": "mp3"}
        )
        return resp.json()["url"]

WebSocket Streaming

const ws = new WebSocket('wss://voice.machinemachine.ai/ws/tts');

ws.onopen = () => {
  ws.send(JSON.stringify({text: "Hello!", voice: "default"}));
};

ws.onmessage = (event) => {
  if (event.data instanceof Blob) {
    // Audio chunk - append to buffer
    audioContext.decodeAudioData(event.data);
  } else {
    // JSON status message
    const status = JSON.parse(event.data);
    if (status.status === "done") console.log("Audio complete");
  }
};

Deployment

See docker-compose.yml (GPU) or docker-compose.cpu.yml (CPU-only).

Domain: voice.machinemachine.ai

# README.md

🎤 m2 Voice - TTS for OpenClaw

"Finally, I can speak." — m2

Text-to-Speech service designed for AI agents. OpenAI-compatible API with file output and WebSocket streaming.

Features

OpenAI-compatible /v1/audio/speech endpoint
File generation with URL return for async workflows
WebSocket streaming for real-time audio
Multiple backends: Qwen3-TTS (GPU) or Piper (CPU)
Coolify-ready Docker Compose

Quick Deploy (Coolify)

Create new project in Coolify
Add Resource → Docker Compose → Git repo
Point to this repo
Enable GPU if using Qwen3-TTS
Set domain: voice.yourdomain.ai

Architecture

┌─────────────────────────────────────────────────────┐
│                   Coolify Proxy                      │
│                (TLS termination)                     │
└──────────────────────┬──────────────────────────────┘
                       │
┌──────────────────────▼──────────────────────────────┐
│              speech-gateway (FastAPI)                │
│                                                      │
│  • /v1/audio/speech  - OpenAI compatible            │
│  • /speak/file       - File generation              │
│  • /ws/tts           - WebSocket streaming          │
│  • /files/{id}       - Serve generated files        │
└──────────────────────┬──────────────────────────────┘
                       │
┌──────────────────────▼──────────────────────────────┐
│              qwen-tts / piper-tts                    │
│                  (TTS Backend)                       │
│                                                      │
│  GPU: Qwen3-TTS-1.7B (expressive, multi-voice)      │
│  CPU: Piper (fast, lightweight)                     │
└─────────────────────────────────────────────────────┘

Usage

Generate Speech (File)

curl -X POST https://voice.machinemachine.ai/speak/file \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello from m2!", "format": "mp3"}'

Response:

{"url": "/files/abc123.mp3", "format": "mp3", "id": "abc123"}

OpenAI Compatible

curl -X POST https://voice.machinemachine.ai/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello world", "voice": "default", "format": "mp3"}' \
  --output speech.mp3

WebSocket Streaming

const ws = new WebSocket('wss://voice.machinemachine.ai/ws/tts');
ws.send(JSON.stringify({text: "Hello!", voice: "default"}));
// Receive binary audio chunks

Configuration

GPU Version (Qwen3-TTS)

# Use docker-compose.yml
docker compose up -d

Requires NVIDIA GPU with CUDA 12.1+.

CPU Version (Piper)

# Use docker-compose.cpu.yml
docker compose -f docker-compose.cpu.yml up -d

Works on any machine, faster inference, smaller models.

Environment Variables

Variable	Default	Description
`TTS_BASE_URL`	`http://qwen-tts:8000`	Backend TTS service
`PUBLIC_URL`	``	Base URL for file links
`STORAGE_DIR`	`/app/output`	Where to store audio files

Integration

Python (OpenClaw agents)

import httpx

async def speak(text: str) -> str:
    async with httpx.AsyncClient() as client:
        resp = await client.post(
            "https://voice.machinemachine.ai/speak/file",
            json={"text": text, "format": "mp3"}
        )
        return resp.json()["url"]

Telegram Bot

Send audio URL directly or download and send as voice message.

License

MIT

Part of the OpenClaw ecosystem.

# Supported AI Coding Agents

This skill is compatible with the SKILL.md standard and works with all major AI coding agents:

⚡ Amp 🚀 Antigravity 🤖 Claude Code 🦀 Clawdbot 📝 Codex ▶️ Cursor 🤖 Droid 💎 Gemini CLI 🐙 GitHub Copilot 🪿 Goose 📊 Kilo Code 🔧 Kiro CLI 💻 OpenCode 🦘 Roo Code 🌲 Trae 🏄 Windsurf

Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.

In your agent code

# Description

# SKILL.md

m2 Voice Skill

Quick Start

Endpoints

Request Format

Voices

Integration with OpenClaw

WebSocket Streaming

Deployment

# README.md

🎤 m2 Voice - TTS for OpenClaw

Features

Quick Deploy (Coolify)

Architecture

Usage

Generate Speech (File)

OpenAI Compatible

WebSocket Streaming

Configuration

GPU Version (Qwen3-TTS)

CPU Version (Piper)

Environment Variables

Integration

Python (OpenClaw agents)

Telegram Bot

License

# Related Skills

# Supported AI Coding Agents

Confirm

Submit a Skill