machine-machine

In your agent code

0
0
# Install this skill:
npx skills add machine-machine/openclaw-tts-skill

Or install specific skill: npx add-skill https://github.com/machine-machine/openclaw-tts-skill

# Description

๐ŸŽค TTS for OpenClaw agents

# SKILL.md

m2 Voice Skill

Text-to-Speech service for OpenClaw agents.

Quick Start

# Generate speech (file mode)
curl -X POST https://voice.machinemachine.ai/speak/file \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello, I am m2!", "voice": "default", "format": "mp3"}'
# Returns: {"url": "/files/abc123.mp3", "format": "mp3", "id": "abc123"}

# OpenAI-compatible endpoint
curl -X POST https://voice.machinemachine.ai/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello world", "voice": "default", "format": "mp3"}' \
  --output speech.mp3

Endpoints

Endpoint Method Description
/health GET Health check
/v1/audio/speech POST OpenAI-compatible TTS
/speak/file POST Generate file, return URL
/ws/tts WebSocket Streaming audio
/files/{id} GET Download generated file

Request Format

{
  "text": "Text to speak",
  "voice": "default",
  "format": "mp3",
  "stream": false
}

Voices

Depends on backend:
- Qwen3-TTS: Expressive, multi-voice (GPU required)
- Piper: Fast, lightweight (CPU-friendly)

Integration with OpenClaw

# In your agent code
import httpx

async def speak(text: str) -> str:
    async with httpx.AsyncClient() as client:
        resp = await client.post(
            "https://voice.machinemachine.ai/speak/file",
            json={"text": text, "format": "mp3"}
        )
        return resp.json()["url"]

WebSocket Streaming

const ws = new WebSocket('wss://voice.machinemachine.ai/ws/tts');

ws.onopen = () => {
  ws.send(JSON.stringify({text: "Hello!", voice: "default"}));
};

ws.onmessage = (event) => {
  if (event.data instanceof Blob) {
    // Audio chunk - append to buffer
    audioContext.decodeAudioData(event.data);
  } else {
    // JSON status message
    const status = JSON.parse(event.data);
    if (status.status === "done") console.log("Audio complete");
  }
};

Deployment

See docker-compose.yml (GPU) or docker-compose.cpu.yml (CPU-only).

Domain: voice.machinemachine.ai

# README.md

๐ŸŽค m2 Voice - TTS for OpenClaw

"Finally, I can speak." โ€” m2

Text-to-Speech service designed for AI agents. OpenAI-compatible API with file output and WebSocket streaming.

Features

  • OpenAI-compatible /v1/audio/speech endpoint
  • File generation with URL return for async workflows
  • WebSocket streaming for real-time audio
  • Multiple backends: Qwen3-TTS (GPU) or Piper (CPU)
  • Coolify-ready Docker Compose

Quick Deploy (Coolify)

  1. Create new project in Coolify
  2. Add Resource โ†’ Docker Compose โ†’ Git repo
  3. Point to this repo
  4. Enable GPU if using Qwen3-TTS
  5. Set domain: voice.yourdomain.ai

Architecture

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                   Coolify Proxy                      โ”‚
โ”‚                (TLS termination)                     โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                       โ”‚
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚              speech-gateway (FastAPI)                โ”‚
โ”‚                                                      โ”‚
โ”‚  โ€ข /v1/audio/speech  - OpenAI compatible            โ”‚
โ”‚  โ€ข /speak/file       - File generation              โ”‚
โ”‚  โ€ข /ws/tts           - WebSocket streaming          โ”‚
โ”‚  โ€ข /files/{id}       - Serve generated files        โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                       โ”‚
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚              qwen-tts / piper-tts                    โ”‚
โ”‚                  (TTS Backend)                       โ”‚
โ”‚                                                      โ”‚
โ”‚  GPU: Qwen3-TTS-1.7B (expressive, multi-voice)      โ”‚
โ”‚  CPU: Piper (fast, lightweight)                     โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Usage

Generate Speech (File)

curl -X POST https://voice.machinemachine.ai/speak/file \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello from m2!", "format": "mp3"}'

Response:

{"url": "/files/abc123.mp3", "format": "mp3", "id": "abc123"}

OpenAI Compatible

curl -X POST https://voice.machinemachine.ai/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello world", "voice": "default", "format": "mp3"}' \
  --output speech.mp3

WebSocket Streaming

const ws = new WebSocket('wss://voice.machinemachine.ai/ws/tts');
ws.send(JSON.stringify({text: "Hello!", voice: "default"}));
// Receive binary audio chunks

Configuration

GPU Version (Qwen3-TTS)

# Use docker-compose.yml
docker compose up -d

Requires NVIDIA GPU with CUDA 12.1+.

CPU Version (Piper)

# Use docker-compose.cpu.yml
docker compose -f docker-compose.cpu.yml up -d

Works on any machine, faster inference, smaller models.

Environment Variables

Variable Default Description
TTS_BASE_URL http://qwen-tts:8000 Backend TTS service
PUBLIC_URL `` Base URL for file links
STORAGE_DIR /app/output Where to store audio files

Integration

Python (OpenClaw agents)

import httpx

async def speak(text: str) -> str:
    async with httpx.AsyncClient() as client:
        resp = await client.post(
            "https://voice.machinemachine.ai/speak/file",
            json={"text": text, "format": "mp3"}
        )
        return resp.json()["url"]

Telegram Bot

Send audio URL directly or download and send as voice message.

License

MIT


Part of the OpenClaw ecosystem.

# Supported AI Coding Agents

This skill is compatible with the SKILL.md standard and works with all major AI coding agents:

Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.