794 results (19.8ms) page 5 / 40
itechmeat / llm-code-inworld exact

Inworld TTS API. Covers voice cloning, audio markups, timestamps. Keywords: text-to-speech, visemes.

Andrejones92 / canifi-life-os-descript exact

Edit audio and video with Descript - transcribe, edit, and produce multimedia content using text-based editing

murf 0.00
Andrejones92 / canifi-life-os-murf exact

Create AI voiceovers with Murf - generate professional narration, manage projects, and export audio

Andrejones92 / canifi-life-os-elevenlabs exact

Generate realistic AI voices with ElevenLabs - create speech, clone voices, and manage audio projects

siviter-xyz / dot-agent-media-processing exact

Media processing utilities for images, audio, and video using FFmpeg and ImageMagick. Use when working with media conversion, optimization, or batch processing tasks.

dkyazzentwatwa / chatgpt-skills-podcast-splitter exact

Split audio files by detecting silence gaps. Auto-segment podcasts into chapters, remove long silences, and export individual clips.

martinholovsky / claude-skills-generator-wake-word-detection exact

Expert skill for implementing wake word detection with openWakeWord. Covers audio monitoring, keyword spotting, privacy protection, and efficient always-listening systems for JARVIS voice assistant.

martinholovsky / claude-skills-generator-speech-to-text exact

Expert skill for implementing speech-to-text with Faster Whisper. Covers audio processing, transcription optimization, privacy protection, and secure handling of voice data for JARVIS voice assistant.

linxule / interpretive-orchestration-interview-ingest exact

This skill should be used when users have audio interview recordings to transcribe, need to convert PDF documents, mentions 'import data', 'transcribe', 'convert', or is starting data preparation...

ngxtm / devkit-ai-multimodal exact

Analyze images/audio/video with Gemini API (better vision than Claude). Generate images (Imagen 4), videos (Veo 3). Use for vision analysis, transcription, OCR, design extraction, multimodal AI.

lwmxiaobei / yt-dlp-skill exact

Download videos and extract audio from various platforms using yt-dlp. Use when user provides a video URL, asks to download a video, or when conversation contains video links from YouTube,...

Xsir0 / xsir-skills-google-gemini-media exact

Use the Gemini API (Nano Banana image generation, Veo video, Gemini TTS speech and audio understanding) to deliver end-to-end multimodal media workflows and code templates for "generation + understanding".

lattifai / omni-captions-skills-omnicaptions-transcribe exact

Use when transcribing audio/video to text with timestamps, speaker labels, and chapters. Supports YouTube URLs and local files. Produces structured markdown output.

Hildegaardchiasmal966 / claude-skills-gemini-live-api exact

Expert developer skill for implementing real-time voice and video interactions using the Google Gemini Live API. This skill should be used when implementing bidirectional audio streaming, voice...

ratacat / claude-skills-assemblyai-streaming exact

This skill should be used when working with AssemblyAI’s Speech-to-Text and LLM Gateway APIs, especially for streaming/live transcription, meeting notetakers, and voice agents that need...

zechenzhangAGI / ai-research-skills-whisper exact

OpenAI's general-purpose speech recognition model. Supports 99 languages, transcription, translation to English, and language identification. Six model sizes from tiny (39M params) to large (1550M...

ovachiever / droid-tings-whisper exact

OpenAI's general-purpose speech recognition model. Supports 99 languages, transcription, translation to English, and language identification. Six model sizes from tiny (39M params) to large (1550M...

0xbeedao / agentic-tools-whisper exact

OpenAI's general-purpose speech recognition model. Supports 99 languages, transcription, translation to English, and language identification. Six model sizes from tiny (39M params) to large (1550M...

qodex-ai / ai-agent-skills-voice-ai-integration exact

Build voice-enabled AI applications with speech recognition, text-to-speech, and voice-based interactions. Supports multiple voice providers and real-time processing. Use when creating voice...

tomkrikorian / visionosagents-realitykit-visionos-developer exact

Build, debug, and optimize RealityKit scenes for visionOS, including entity/component setup, rendering, animation, physics, audio, input, attachments, and custom systems. Use when implementing...