speech-build

Name: speech-build
Rating: 5 (9 reviews)
Author: cnemri

by @cnemri in AI & LLM

# Install this skill:

npx skills add cnemri/google-genai-skills --skill "speech-build"

Install specific skill from multi-skill repository

# Description

Generate and transcribe speech using Google's Gemini-TTS and Chirp 3 models. Supports Text-to-Speech (Single/Multi-speaker), Instant Custom Voice, and Speech-to-Text (Transcription/Diarization).

# SKILL.md

name: speech-build
description: Generate and transcribe speech using Google's Gemini-TTS and Chirp 3 models. Supports Text-to-Speech (Single/Multi-speaker), Instant Custom Voice, and Speech-to-Text (Transcription/Diarization).

Speech Skill (TTS & STT)

Use this skill to implement audio generation and transcription workflows using the google-genai and google-cloud-speech SDKs.

Quick Start Setup

from google import genai
from google.genai import types
# For STT: from google.cloud import speech_v2

client = genai.Client()

Reference Materials

Text-to-Speech (TTS): Gemini-TTS, Chirp 3 HD, Instant Custom Voice.
Speech-to-Text (STT): Chirp 3 Transcription, Diarization, Streaming.
Voices & Locales: Available voices (Aoede, Puck...) and languages.
Prompting Guide: How to control style, accent, and pacing in Gemini-TTS.
Source Code: Deep inspection of SDK internals.

Common Workflows

1. Generate Speech (Gemini-TTS)

response = client.models.generate_content(
    model="gemini-2.5-flash-preview-tts",
    contents="Hello, world!",
    config=types.GenerateContentConfig(
        response_modalities=["AUDIO"],
        speech_config=types.SpeechConfig(
            voice_config=types.VoiceConfig(
                prebuilt_voice_config=types.PrebuiltVoiceConfig(voice_name='Kore')
            )
        )
    )
)

2. Transcribe Audio (Chirp 3)

# Requires google-cloud-speech
from google.cloud import speech_v2
# ... (See stt.md for full setup)
response = speech_client.recognize(...)

# Supported AI Coding Agents

This skill is compatible with the SKILL.md standard and works with all major AI coding agents:

⚡ Amp 🚀 Antigravity 🤖 Claude Code 🦀 Clawdbot 📝 Codex ▶️ Cursor 🤖 Droid 💎 Gemini CLI 🐙 GitHub Copilot 🪿 Goose 📊 Kilo Code 🔧 Kiro CLI 💻 OpenCode 🦘 Roo Code 🌲 Trae 🏄 Windsurf

Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.