Give your agent the ability to speak to you real-time. Talk to your Claude! Local TTS, text-to-speech, voice synthesis, audio generation with voice cloning on Apple Silicon. Use for reading...
Give your agent the ability to speak to you real-time. Talk to your Claude! Ultra-fast TTS, text-to-speech, voice synthesis, audio output with ~90ms latency. 8 built-in voices for instant voice...
Generate speech from text using Google Gemini TTS models via scripts/. Use for text-to-speech, audio generation, voice synthesis, multi-speaker conversations, and creating audio content. Supports...
MiniMax TTS API - 文本转语音、声音克隆、声音设计
>
>
Local text-to-speech via sherpa-onnx (offline, no cloud)
Local text-to-speech via sherpa-onnx (offline, no cloud)
Voice agents represent the frontier of AI interaction - humans speaking naturally with AI systems. The challenge isn't just speech recognition and synthesis, it's achieving natural conversation...
Configure Home Assistant Assist voice control with pipelines, intents, wake words, and speech processing. Use when setting up voice control, creating custom intents, configuring TTS/STT, or...
Expert in voice synthesis, TTS, voice cloning, podcast production, speech processing, and voice UI design via ElevenLabs integration. Specializes in vocal clarity, loudness standards (LUFS),...
文本转语音工具 - 支持脚本解析、情绪标记和后处理,基于 Edge TTS
Inworld TTS API. Covers voice cloning, audio markups, timestamps. Keywords: text-to-speech, visemes.
使用 edge-tts 生成多语言配音(中文/英文)。当需要为视频生成语音旁白、基于时间线同步配音时使用。支持语速调整、多种声音选择和配音验证。
Expert skill for implementing text-to-speech with Kokoro TTS. Covers voice synthesis, audio generation, performance optimization, and secure handling of generated audio for JARVIS voice assistant.
Build real-time conversational AI voice engines using async worker pipelines, streaming transcription, LLM agents, and TTS synthesis with interrupt handling and multi-provider support
Build real-time conversational AI voice engines using async worker pipelines, streaming transcription, LLM agents, and TTS synthesis with interrupt handling and multi-provider support
Use the Gemini API (Nano Banana image generation, Veo video, Gemini TTS speech and audio understanding) to deliver end-to-end multimodal media workflows and code templates for "generation + understanding".
Create professional kinetic typography videos from scratch. Includes speech writing, TTS with emotional dynamics, music generation, and animated text. Use for: promo videos, explainers, social...
Voice agents represent the frontier of AI interaction - humans speaking naturally with AI systems. The challenge isn't just speech recognition and synthesis, it's achieving natural conversation...