40 results (35.4ms) page 2 / 2
omer-metin / skills-for-antigravity-voice-agents exact

Voice agents represent the frontier of AI interaction - humans speaking naturally with AI systems. The challenge isn't just speech recognition and synthesis, it's achieving natural conversation...

omer-metin / skills-for-antigravity-ai-music-audio exact

Comprehensive patterns for AI-powered audio generation including text-to-music, voice synthesis, text-to-speech, sound effects, and audio manipulation using MusicGen, Bark, ElevenLabs, and more....

404kidwiz / agent-skills-backup-voice-ai-development exact

Expert in building voice AI applications - from real-time voice agents to voice-enabled apps. Covers OpenAI Realtime API, Vapi for voice agents, Deepgram for transcription, ElevenLabs for...

sickn33 / antigravity-awesome-skills-voice-ai-development exact

Expert in building voice AI applications - from real-time voice agents to voice-enabled apps. Covers OpenAI Realtime API, Vapi for voice agents, Deepgram for transcription, ElevenLabs for...

automindtechnologie-jpg / ultimate-skill-md-voice-ai-development exact

Expert in building voice AI applications - from real-time voice agents to voice-enabled apps. Covers OpenAI Realtime API, Vapi for voice agents, Deepgram for transcription, ElevenLabs for...

cleodin / antigravity-awesome-skills-voice-ai-development exact

Expert in building voice AI applications - from real-time voice agents to voice-enabled apps. Covers OpenAI Realtime API, Vapi for voice agents, Deepgram for transcription, ElevenLabs for...

halay08 / fullstack-agent-skills-voice-ai-development exact

Expert in building voice AI applications - from real-time voice agents to voice-enabled apps. Covers OpenAI Realtime API, Vapi for voice agents, Deepgram for transcription, ElevenLabs for...

ngxtm / devkit-voice-ai-development exact

Expert in building voice AI applications - from real-time voice agents to voice-enabled apps. Covers OpenAI Realtime API, Vapi for voice agents, Deepgram for transcription, ElevenLabs for...

erichowens / some-claude-skills-voice-audio-engineer exact

Expert in voice synthesis, TTS, voice cloning, podcast production, speech processing, and voice UI design via ElevenLabs integration. Specializes in vocal clarity, loudness standards (LUFS),...

samhvw8 / dot-claude-ai-multimodal exact

Multimodal AI processing via Google Gemini API (2M tokens context). Capabilities: audio (transcription, 9.5hr max, summarization, music analysis), images (captioning, OCR, object detection,...

binjuhor / shadcn-lar-ai-multimodal exact

Process and generate multimedia content using Google Gemini API for better vision capabilities. Capabilities include analyze audio files (transcription with timestamps, summarization, speech...

binhmuc / autobot-review-ai-multimodal exact

Process and generate multimedia content using Google Gemini API for better vision capabilities. Capabilities include analyze audio files (transcription with timestamps, summarization, speech...

qodex-ai / ai-agent-skills-voice-ai-integration exact

Build voice-enabled AI applications with speech recognition, text-to-speech, and voice-based interactions. Supports multiple voice providers and real-time processing. Use when creating voice...

aviz85 / claude-skills-library-kinetic-video-creator exact

Create professional kinetic typography videos from scratch. Includes speech writing, TTS with emotional dynamics, music generation, and animated text. Use for: promo videos, explainers, social...

zechenzhangAGI / ai-research-skills-whisper exact

OpenAI's general-purpose speech recognition model. Supports 99 languages, transcription, translation to English, and language identification. Six model sizes from tiny (39M params) to large (1550M...

ovachiever / droid-tings-whisper exact

OpenAI's general-purpose speech recognition model. Supports 99 languages, transcription, translation to English, and language identification. Six model sizes from tiny (39M params) to large (1550M...

0xbeedao / agentic-tools-whisper exact

OpenAI's general-purpose speech recognition model. Supports 99 languages, transcription, translation to English, and language identification. Six model sizes from tiny (39M params) to large (1550M...

jackspace / claudeskillz-ai-multimodal exact

Process and generate multimedia content using Google Gemini API. Capabilities include analyze audio files (transcription with timestamps, summarization, speech understanding, music/sound analysis...