Process and generate multimedia content using Google Gemini API. Capabilities include analyze audio files (transcription with timestamps, summarization, speech understanding, music/sound analysis...
Analyze images/audio/video with Gemini API (better vision than Claude). Generate images (Imagen 4), videos (Veo 3). Use for vision analysis, transcription, OCR, design extraction, multimodal AI.
Latest AI models reference - Claude, OpenAI, Gemini, Eleven Labs, Replicate
Multimodal AI processing via Google Gemini API (2M tokens context). Capabilities: audio (transcription, 9.5hr max, summarization, music analysis), images (captioning, OCR, object detection,...
Help users create and run AI evaluations. Use when someone is building evals for LLM products, measuring model quality, creating test cases, designing rubrics, or trying to systematically measure...
Generate text content using Google Gemini models via scripts/. Use for text generation, multimodal prompts with images, thinking mode for complex reasoning, JSON-formatted outputs, and Google...
Use this skill when building AI features, integrating LLMs, implementing RAG, working with embeddings, deploying ML models, or doing data science. Activates on mentions of OpenAI, Anthropic,...
Expert in building products that wrap AI APIs (OpenAI, Anthropic, etc.) into focused tools people will pay for. Not just 'ChatGPT but different' - products that solve specific problems with AI....
This sop analyzes a codebase and generates comprehensive documentation including structured metadata files that describe the system architecture, components, interfaces, and workflows. It can...
Remove Gemini logos, watermarks, or AI-generated image markers using OpenCV inpainting. Use this skill when the user asks to remove Gemini logo, AI watermark, or any logo/watermark from images.
Expert in applying AI to education - AI tutors, personalized learning paths, content generation, automated assessments, and adaptive learning systems. Covers practical implementation of AI to...
Enables Claude to interact with Gemini AI chat for quick queries, brainstorming, and alternative AI perspectives
Generate images using Google Gemini AI with text prompts and reference images. Use when creating game assets, concept art, UI mockups, promotional images, or any visual content. Supports...
Build real-time voice AI applications using Azure AI Voice Live SDK (azure-ai-voicelive). Use this skill when creating Python applications that need real-time bidirectional audio communication...
Expert in Cursor AI IDE - the leading AI-powered code editor. Covers Rules files for project-specific AI behavior, Plan Mode for structured development, Background Agents for parallel work, and...
Interact with Google's Gemini model via CLI. Use when needing a second opinion from another LLM, cross-validation, or leveraging Gemini's Google Search grounding. Supports multi-turn conversations...
Configure or debug LLM blog post generation using Vercel AI SDK and Google Gemini. Use when updating blog generation prompts, fixing AI integration issues, modifying content generation logic, or...
Patterns for running AI models locally in browsers using WebGPU, Transformers.js, WebLLM, and ONNX Runtime. Zero API costs, full privacy. Use when "on-device AI, browser AI, WebLLM,...
Generate images using Google Gemini and Imagen models via scripts/. Use for AI image generation, text-to-image, creating visuals from prompts, generating multiple images, custom aspect ratios, and...
Process and generate multimedia content using Google Gemini API. Capabilities include analyze audio files (transcription with timestamps, summarization, speech understanding, music/sound analysis...