Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world...
Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world...
Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world...
Set up Biome (default) or ESLint + Prettier, Vitest testing, and pre-commit hooks for any JavaScript/TypeScript project. Uses Bun as the package manager. Use this skill when initializing code...
Multimodal AI processing via Google Gemini API (2M tokens context). Capabilities: audio (transcription, 9.5hr max, summarization, music analysis), images (captioning, OCR, object detection,...
Coding standards for readable, maintainable, testable code including SOLID principles, clean code practices, DDD, and TDD. Use when implementing new features, refactoring code, performing code...
Review code for quality, security, accessibility, and best practices in Next.js and TypeScript projects. Use when the user asks for a code review, review of a PR, or feedback on code changes....
Develop high-quality, accessible React components using shadcn-ui, Tailwind CSS, and Radix UI. Use when building forms, layouts, dialogs, tables, or any UI components. Supports Next.js, Vite,...
Generate draw.io editable diagrams (.drawio, .drawio.svg) from text, images, or Excel. Orchestrates 3-agent workflow (Analysis → Manifest → SVG generation) with quality gates. Use when creating...
Expert product strategist for vision, strategy, and market positioning. Use when defining product vision, assessing product-market fit, sizing market opportunities (TAM/SAM/SOM), competitive...
Improves the quality of images, especially screenshots, by enhancing
Create production-quality Android applications following Google's official architecture guidance and NowInAndroid best practices. Use when building Android apps with Kotlin, Jetpack Compose, MVVM...
Analyze project features against ICP (Ideal Customer Profile) needs to identify gaps and recommend roadmap priorities. Use this skill when asked to evaluate current product state, identify what...
Prepare designs for development handoff. Document specifications, interactions, and assets to enable efficient development and maintain design quality.
Create distinctive, production-grade frontend interfaces with high design quality using the primer design system and brand guidelines. Use this skill when the user asks to build web components,...
Analyze blog posts for SEO, readability, headline quality, and content structure. Generate meta tags and optimization recommendations with scoring.
Challenge idea assumptions with skeptical VC-style evaluation. Use when user requests critique, validation, or 'is this a good idea' assessment.
Cross-platform and native mobile development. Frameworks: React Native, Flutter, Swift/SwiftUI, Kotlin/Jetpack Compose. Capabilities: mobile UI, offline-first architecture, push notifications,...
Security audit patterns for PHP/OWASP. Use when conducting security assessments, identifying vulnerabilities (XXE, SQL injection, XSS), or CVSS scoring.
Schema for tracking code review outcomes to enable feedback-driven skill improvement. Use when logging review results or analyzing review quality.