Google Drive CLI for listing, searching, uploading, downloading, and sharing files and folders.
Fetch transcripts from YouTube videos for summarization and analysis.
VS Code integration for viewing diffs and comparing files. Use when showing file differences to the user.
Interactive browser automation via Chrome DevTools Protocol. Use when you need to interact with web pages, test frontends, or when user interaction with a visible browser is required.
Gmail CLI for searching emails, reading threads, sending messages, managing drafts, and handling labels/attachments.
Web search and content extraction via Brave Search API. Use for searching documentation, facts, or any web content. Lightweight, no browser required.
Google Calendar CLI for listing calendars, viewing/creating/updating events, and checking availability.
Expert in spatial audio, procedural sound design, game audio middleware, and app UX sound design. Specializes in HRTF/Ambisonics, Wwise/FMOD integration, UI sound design, and adaptive music...
Video/audio/image processing with FFmpeg and ImageMagick. Tools: FFmpeg (video/audio), ImageMagick (images). Capabilities: format conversion, encoding (H.264/H.265/VP9/AV1), streaming (HLS/DASH),...
Expert in 2000s-era music visualization (Milkdrop, AVS, Geiss) and modern WebGL implementations. Specializes in Butterchurn integration, Web Audio API AnalyserNode FFT data, GLSL shaders for...
Video and audio processing with FFmpeg. Use for format conversion, resizing, compression, audio extraction, and preparing assets for Remotion. Triggers include converting GIF to MP4, resizing...
Generate speech from text using Google Gemini TTS models via scripts/. Use for text-to-speech, audio generation, voice synthesis, multi-speaker conversations, and creating audio content. Supports...
Transcribe audio via OpenAI Audio Transcriptions API (Whisper).
Transcribe audio via OpenAI Audio Transcriptions API (Whisper).
Implement speech-to-text (ASR/automatic speech recognition) capabilities using the z-ai-web-dev-sdk. Use this skill when the user needs to transcribe audio files, convert speech to text, build...
Generate AI-powered podcast-style audio narratives using Azure OpenAI's GPT Realtime Mini model via WebSocket. Use when building text-to-speech features, audio narrative generation, podcast...
Patterns for building multimodal AI applications that combine text, images, audio, and video. Covers vision APIs, audio transcription, and unified pipelines. Use when "multimodal AI, vision API,...
Convert various file formats (PDF, Office documents, images, audio, web content, structured data) to Markdown optimized for LLM processing. Use when converting documents to markdown, extracting...
Convert various file formats (PDF, Office documents, images, audio, web content, structured data) to Markdown optimized for LLM processing. Use when converting documents to markdown, extracting...
Convert various file formats (PDF, Office documents, images, audio, web content, structured data) to Markdown optimized for LLM processing. Use when converting documents to markdown, extracting...