Search: audio-cnn | AgentSkillsRepo

inworld 0.00

itechmeat / llm-code-inworld exact

Inworld TTS API. Covers voice cloning, audio markups, timestamps. Keywords: text-to-speech, visemes.

★ 1 web

descript 0.00

Andrejones92 / canifi-life-os-descript exact

Edit audio and video with Descript - transcribe, edit, and produce multimedia content using text-based editing

★ 2 development

murf 0.00

Andrejones92 / canifi-life-os-murf exact

Create AI voiceovers with Murf - generate professional narration, manage projects, and export audio

★ 2 ai

elevenlabs 0.00

Andrejones92 / canifi-life-os-elevenlabs exact

Generate realistic AI voices with ElevenLabs - create speech, clone voices, and manage audio projects

★ 2 ai

media-processing 0.00

siviter-xyz / dot-agent-media-processing exact

Media processing utilities for images, audio, and video using FFmpeg and ImageMagick. Use when working with media conversion, optimization, or batch processing tasks.

★ 1 development

podcast-splitter 0.00

dkyazzentwatwa / chatgpt-skills-podcast-splitter exact

Split audio files by detecting silence gaps. Auto-segment podcasts into chapters, remove long silences, and export individual clips.

★ 7 ai

chatgpt claude-skills

wake-word-detection 0.00

martinholovsky / claude-skills-generator-wake-word-detection exact

Expert skill for implementing wake word detection with openWakeWord. Covers audio monitoring, keyword spotting, privacy protection, and efficient always-listening systems for JARVIS voice assistant.

★ 20 devops

speech-to-text 0.00

martinholovsky / claude-skills-generator-speech-to-text exact

Expert skill for implementing speech-to-text with Faster Whisper. Covers audio processing, transcription optimization, privacy protection, and secure handling of voice data for JARVIS voice assistant.

★ 20 data

interview-ingest 0.00

linxule / interpretive-orchestration-interview-ingest exact

This skill should be used when users have audio interview recordings to transcribe, need to convert PDF documents, mentions 'import data', 'transcribe', 'convert', or is starting data preparation...

★ 1 ai

academic-research ai-research-tools claude-code grounded-theory

ai-multimodal 0.00

ngxtm / devkit-ai-multimodal exact

Analyze images/audio/video with Gemini API (better vision than Claude). Generate images (Imagen 4), videos (Veo 3). Use for vision analysis, transcription, OCR, design extraction, multimodal AI.

★ 0 ai

agent ai automation claude

yt-dlp 0.00

lwmxiaobei / yt-dlp-skill exact

Download videos and extract audio from various platforms using yt-dlp. Use when user provides a video URL, asks to download a video, or when conversation contains video links from YouTube,...

★ 4 ai

google-gemini-media 0.00

Xsir0 / xsir-skills-google-gemini-media exact

Use the Gemini API (Nano Banana image generation, Veo video, Gemini TTS speech and audio understanding) to deliver end-to-end multimodal media workflows and code templates for "generation + understanding".

★ 0 development

omnicaptions-transcribe 0.00

lattifai / omni-captions-skills-omnicaptions-transcribe exact

Use when transcribing audio/video to text with timestamps, speaker labels, and chapters. Supports YouTube URLs and local files. Produces structured markdown output.

★ 21 development

gemini-live-api 0.00

Hildegaardchiasmal966 / claude-skills-gemini-live-api exact

Expert developer skill for implementing real-time voice and video interactions using the Google Gemini Live API. This skill should be used when implementing bidirectional audio streaming, voice...

★ 1 development

agentic-ai ai anthropic-ai anthropic-skills

assemblyai-streaming 0.00

ratacat / claude-skills-assemblyai-streaming exact

This skill should be used when working with AssemblyAI’s Speech-to-Text and LLM Gateway APIs, especially for streaming/live transcription, meeting notetakers, and voice agents that need...

★ 16 ai

whisper 0.00

ovachiever / droid-tings-whisper exact

OpenAI's general-purpose speech recognition model. Supports 99 languages, transcription, translation to English, and language identification. Six model sizes from tiny (39M params) to large (1550M...

★ 19 ai

whisper 0.00

0xbeedao / agentic-tools-whisper exact

OpenAI's general-purpose speech recognition model. Supports 99 languages, transcription, translation to English, and language identification. Six model sizes from tiny (39M params) to large (1550M...

★ 0 ai

whisper 0.00

zechenzhangAGI / ai-research-skills-whisper exact

OpenAI's general-purpose speech recognition model. Supports 99 languages, transcription, translation to English, and language identification. Six model sizes from tiny (39M params) to large (1550M...

★ 1,712 ai

ai ai-research claude claude-code

voice-ai-integration 0.00

qodex-ai / ai-agent-skills-voice-ai-integration exact

Build voice-enabled AI applications with speech recognition, text-to-speech, and voice-based interactions. Supports multiple voice providers and real-time processing. Use when creating voice...

★ 1 tools

realitykit-visionos-developer 0.00

tomkrikorian / visionosagents-realitykit-visionos-developer exact

Build, debug, and optimize RealityKit scenes for visionOS, including entity/component setup, rendering, animation, physics, audio, input, attachments, and custom systems. Use when implementing...

★ 31 development

Confirm

Submit a Skill