1298 results (11.6ms) page 3 / 65
moltbot / moltbot-openai-whisper-api exact

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

ASR 0.00
AnswerZhao / agent-skills-glm-skills-asr exact

Implement speech-to-text (ASR/automatic speech recognition) capabilities using the z-ai-web-dev-sdk. Use this skill when the user needs to transcribe audio files, convert speech to text, build...

erichowens / some-claude-skills-clip-aware-embeddings exact

Semantic image-text matching with CLIP and alternatives. Use for image search, zero-shot classification, similarity matching. NOT for counting objects, fine-grained classification (celebrities,...

ngxtm / devkit-convert-pcm-to-wav-see-scripts-pcm-to-wav-py exact

Generate AI-powered podcast-style audio narratives using Azure OpenAI's GPT Realtime Mini model via WebSocket. Use when building text-to-speech features, audio narrative generation, podcast...

omer-metin / skills-for-antigravity-multimodal-ai exact

Patterns for building multimodal AI applications that combine text, images, audio, and video. Covers vision APIs, audio transcription, and unified pipelines. Use when "multimodal AI, vision API,...

jackspace / claudeskillz-markitdown exact

Convert various file formats (PDF, Office documents, images, audio, web content, structured data) to Markdown optimized for LLM processing. Use when converting documents to markdown, extracting...

0xbeedao / agentic-tools-markitdown exact

Convert various file formats (PDF, Office documents, images, audio, web content, structured data) to Markdown optimized for LLM processing. Use when converting documents to markdown, extracting...

ovachiever / droid-tings-markitdown exact

Convert various file formats (PDF, Office documents, images, audio, web content, structured data) to Markdown optimized for LLM processing. Use when converting documents to markdown, extracting...

nodnarbnitram / claude-code-extensions-esphome-box3-builder exact

This skill should be used when the user asks to "configure esp32-s3-box-3", "set up box-3", "create box-3 voice assistant", "display lambda on box-3", "configure ili9xxx display", "set up gt911...

samhvw8 / dot-claude-ai-multimodal exact

Multimodal AI processing via Google Gemini API (2M tokens context). Capabilities: audio (transcription, 9.5hr max, summarization, music analysis), images (captioning, OCR, object detection,...

ImGoodBai / goodable-good-ttvideo2text exact

Extract audio from short videos (Douyin/TikTok) and transcribe to text with timestamps. Use when user provides video URL and needs audio transcription.

martinholovsky / claude-skills-generator-text-to-speech exact

Expert skill for implementing text-to-speech with Kokoro TTS. Covers voice synthesis, audio generation, performance optimization, and secure handling of generated audio for JARVIS voice assistant.

digitalsamba / claude-code-video-toolkit-elevenlabs exact

Generate AI voiceovers, sound effects, and music using ElevenLabs APIs. Use when creating audio content for videos, podcasts, or games. Triggers include generating voiceovers, narration, dialogue,...

steipete / agent-scripts-video-transcript-downloader exact

Download videos, audio, subtitles, and clean paragraph-style transcripts from YouTube and any other yt-dlp supported site. Use when asked to “download this video”, “save this clip”, “rip audio”,...

devskale / skale-skills-video-transcript-downloader exact

Download videos, audio, subtitles, and clean paragraph-style transcripts from YouTube and any other yt-dlp supported site. Use when asked to “download this video”, “save this clip”, “rip audio”,...

iamzhihuix / happy-claude-skills-video-processor exact

Download and process videos from YouTube and other platforms. Supports video download, audio extraction, format conversion (mp4, webm), and Whisper transcription. Use when user mentions YouTube...

lancenunes / codex-skills-video-transcript-downloader exact

Download videos, audio, subtitles, and clean paragraph-style transcripts from YouTube and any other yt-dlp supported site. Use when asked to “download this video”, “save this clip”, “rip audio”,...

MapleShaw / yt-dlp-downloader-skill exact

Download videos from YouTube, Bilibili, Twitter, and thousands of other sites using yt-dlp. Use when the user provides a video URL and wants to download it, extract audio (MP3), download...

404kidwiz / claude-supercode-skills-nlp-engineer exact

Expert in Natural Language Processing, designing systems for text classification, NER, translation, and LLM integration using Hugging Face, spaCy, and LangChain. Use when building NLP pipelines,...