Expert skill for implementing text-to-speech with Kokoro TTS. Covers voice synthesis, audio generation, performance optimization, and secure handling of generated audio for JARVIS voice assistant.
Expert skill for implementing speech-to-text with Faster Whisper. Covers audio processing, transcription optimization, privacy protection, and secure handling of voice data for JARVIS voice assistant.
Generate and transcribe speech using Google's Gemini-TTS and Chirp 3 models. Supports Text-to-Speech (Single/Multi-speaker), Instant Custom Voice, and Speech-to-Text (Transcription/Diarization).
文本转语音工具 - 支持脚本解析、情绪标记和后处理,基于 Edge TTS
Generate text content using Google Gemini models via scripts/. Use for text generation, multimodal prompts with images, thinking mode for complex reasoning, JSON-formatted outputs, and Google...
Expert patterns for AI video generation including text-to-video, image-to-video, video editing, and API integration with Runway, Kling, Luma, Wan, and ReplicateUse when "text to video, video...
Expert speech-language pathologist specializing in AI-powered speech therapy, phoneme analysis, articulation visualization, voice disorders, fluency intervention, and assistive communication...
Text refinement and rewriting tool that leverages reference writing styles to regenerate or polish files. When Claude needs to rewrite content to match a specific style, improve clarity, enhance...
Real-time communication coach for navigating partner/relationship texts. Analyzes incoming messages for emotional subtext, suggests thoughtful responses, helps de-escalate conflict, and provides...
Generate speech from text using Google Gemini TTS models via scripts/. Use for text-to-speech, audio generation, voice synthesis, multi-speaker conversations, and creating audio content. Supports...
Expert in building voice AI applications - from real-time voice agents to voice-enabled apps. Covers OpenAI Realtime API, Vapi for voice agents, Deepgram for transcription, ElevenLabs for...
Generate high-quality images from text prompts using fal.ai's text-to-image models. Supports intelligent model selection, style transfer, and professional-grade outputs.
Generate structured narrative text visualizations from data using T8 (Text) schema. Use when users want to create data interpretation reports, summaries, or structured articles with entity...
Text-to-Speech using Doubao (Volcano Engine) API. Use when converting text to natural-sounding speech, generating audio files from text, listing available TTS voices, or synthesizing speech with...
Give your agent the ability to speak to you real-time. Talk to your Claude! Local TTS, text-to-speech, voice synthesis, audio generation with voice cloning on Apple Silicon. Use for reading...
Implement speech-to-text (ASR/automatic speech recognition) capabilities using the z-ai-web-dev-sdk. Use this skill when the user needs to transcribe audio files, convert speech to text, build...
Give your agent the ability to speak to you real-time. Talk to your Claude! Ultra-fast TTS, text-to-speech, voice synthesis, audio output with ~90ms latency. 8 built-in voices for instant voice...
Write memorable speeches and presentations that inspire and persuade audiences
Generate extractive summaries from long text documents. Control summary length, extract key sentences, and process multiple documents.
Translate text content to target language. For Markdown files, preserves structure and code. Triggers on "translate to", "翻譯成", "convert to". Supports any language Claude understands.