Visual design intelligence and UI aesthetics. Integrates: chrome-devtools, ai-multimodal, media-processing. Capabilities: design analysis, visual hierarchy, color theory, typography,...
Browser automation using Playwright MCP. Navigate websites, fill forms, click elements, take screenshots, and extract data. Use for web browsing, form submission, web scraping, or UI testing. NOT...
Control the user own Chrome browser via Playwriter extension with Playwright code snippets in a stateful local js sandbox via playwriter cli. Automate web interactions, take screenshots, inspect...
Convert raw notes, error logs, voice dictation, or screenshots into crisp GitHub-flavored markdown issue reports. Use when the user pastes bug info, error messages, or informal descriptions and...
Analyzes Figma designs and generates implementation-ready PRDs with detailed visual specifications. Use when user provides Figma link or uploads design screenshots. Requires Figma MCP server connection.
GPU-optimized OCR using Surya. Use when: (1) Extracting text from images/screenshots, (2) Processing PDFs with embedded images, (3) Multi-language document OCR, (4) Layout analysis and table...
Process and generate multimedia content using Google Gemini API for better vision capabilities. Capabilities include analyze audio files (transcription with timestamps, summarization, speech...
Process and generate multimedia content using Google Gemini API for better vision capabilities. Capabilities include analyze audio files (transcription with timestamps, summarization, speech...
Interact with a web browser. Can start a browser, connect to it, evaluate JavaScript, make screenshots, read console logs and let the user select DOM elements. Use when interacting with unknown...
Process and generate multimedia content using Google Gemini API. Capabilities include analyze audio files (transcription with timestamps, summarization, speech understanding, music/sound analysis...
Process and generate multimedia content using Google Gemini API. Capabilities include analyze audio files (transcription with timestamps, summarization, speech understanding, music/sound analysis...
Process and generate multimedia content using Google Gemini API. Capabilities include analyze audio files (transcription with timestamps, summarization, speech understanding, music/sound analysis...
Multimodal AI processing via Google Gemini API (2M tokens context). Capabilities: audio (transcription, 9.5hr max, summarization, music analysis), images (captioning, OCR, object detection,...
Legendary blog writing that makes readers forget they're reading. This skill combines the narrative mastery of Paul Graham's essays, the technical accessibility of Julia Evans, the conversational...
Enhance and improve image quality and visual content. Applies enhancement techniques, color correction, and optimization transformations.
Analyze images using AI with MiniMax understand_image tool
Analyze images using AI with the understand_image tool
Transform markdown/HTML into styled 3:4 ratio images for Xiaohongshu
|
Browser automation for E2E testing. Use when testing user journeys, verifying UI behavior, or running end-to-end tests.