Evaluates agent skills against Anthropic's best practices. Use when asked to review, evaluate, assess, or audit a skill for quality. Analyzes SKILL.md structure, naming conventions, description...
Use when creating, editing, evaluating, testing, or verifying ANY skill or skill-related file (SKILL.md, skill resources, skill scripts, or skill assets). If you're asked to evaluate or test a...
Design, evaluate, and document software architectures including system design, design patterns, architecture patterns, scalability planning, and technology selection. Use when designing systems,...
Orchestrate parallel CLI agents (Claude Code, Codex, Gemini) for competitive evaluation. Use when user says "run multi-agent", "compare agents", "launch competitive evaluation", "use parallel...
Expert in Langfuse - the open-source LLM observability platform. Covers tracing, prompt management, evaluation, datasets, and integration with LangChain, LlamaIndex, and OpenAI. Essential for...
Expert in Langfuse - the open-source LLM observability platform. Covers tracing, prompt management, evaluation, datasets, and integration with LangChain, LlamaIndex, and OpenAI. Essential for...
Expert in Langfuse - the open-source LLM observability platform. Covers tracing, prompt management, evaluation, datasets, and integration with LangChain, LlamaIndex, and OpenAI. Essential for...
Expert in Langfuse - the open-source LLM observability platform. Covers tracing, prompt management, evaluation, datasets, and integration with LangChain, LlamaIndex, and OpenAI. Essential for...
Expert in Langfuse - the open-source LLM observability platform. Covers tracing, prompt management, evaluation, datasets, and integration with LangChain, LlamaIndex, and OpenAI. Essential for...
Expert in Langfuse - the open-source LLM observability platform. Covers tracing, prompt management, evaluation, datasets, and integration with LangChain, LlamaIndex, and OpenAI. Essential for...
Expert in Langfuse - the open-source LLM observability platform. Covers tracing, prompt management, evaluation, datasets, and integration with LangChain, LlamaIndex, and OpenAI. Essential for...
Help users generate and evaluate startup ideas. Use when someone is brainstorming business ideas, trying to find a startup concept, evaluating whether an idea is worth pursuing, or looking for...
Comprehensive skill evaluation and debugging framework for testing agent skills. Use when users need to (1) evaluate a skill's recall rate (how often it triggers correctly), (2) test a skill's...
Expert UX evaluator for command-line interfaces, CLIs, terminal tools, shell scripts, and developer APIs. Use proactively when reviewing CLIs, testing command usability, evaluating error messages,...
Apply Peter Thiel's Zero to One principles for building new products, evaluating ideas, and strategic thinking. Use when creating something new (not iterating), assessing market opportunities,...
Use when verifying claims before decisions, fact-checking statements against sources, conducting due diligence on vendor/competitor assertions, evaluating conflicting evidence, triangulating...
Evaluate module design using APOSD principles with 40-item checklist. Detect complexity symptoms (change amplification, cognitive load, unknown unknowns), shallow modules, information leakage,...
Master context engineering for AI agent systems. Use when designing agent architectures, debugging context failures, optimizing token usage, implementing memory systems, building multi-agent...
Conducts comprehensive architecture design reviews including system design validation, architecture pattern assessment, quality attributes evaluation, technology stack review, and scalability...
Discover and analyze influencers across Instagram, Twitter/X, LinkedIn, YouTube, and Reddit using anysite MCP server. Find content creators by niche, analyze engagement metrics, evaluate audience...