Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world...
Guide for authoring specialized AI agents. Use when creating, updating, or improving agents, choosing models, defining focus areas, configuring tools, or learning agent best practices.
Implements Manus-style file-based planning for complex tasks. Creates task_plan.md, findings.md, and progress.md. Use when starting complex multi-step tasks, research projects, or any task...
MANDATORY for code review - must use Codex CLI for all code reviews, then apply fixes based on Codex feedback. Also use for cross-verification, debugging, and getting alternative implementations.
Manage multiple local CLI agents via tmux sessions (start/stop/monitor/assign) with cron-friendly scheduling.
Review and improve AI agent instruction documents (AGENTS.md, Claude.md, etc.) for quality, clarity, and effectiveness. Use when users request review of agent documentation, ask to evaluate...
Claude in Chrome - browser automation via the official Anthropic extension. Control your logged-in Chrome browser, automate workflows, fill forms, extract data, and run scheduled tasks.
A hybrid memory system that provides persistent, searchable knowledge management for AI agents (Architecture, Patterns, Decisions).
Configure popular MCP servers for enhanced agent capabilities
Fast workflow for small changes, bug fixes, and UI tweaks that don't require full feature development. Uses sub-agent orchestration with model selection (Sonnet 4.5 orchestrator, Haiku 4.5...
Token-efficient parallel execution mode using Haiku and Sonnet agents
Orchestrate parallel CLI agents (Claude Code, Codex, Gemini) for competitive evaluation. Use when user says "run multi-agent", "compare agents", "launch competitive evaluation", "use parallel...
Optimize multi-agent systems with coordinated profiling, workload distribution, and cost-aware orchestration. Use when improving agent performance, throughput, or reliability.
Optimize multi-agent systems with coordinated profiling, workload distribution, and cost-aware orchestration. Use when improving agent performance, throughput, or reliability.
Optimize multi-agent systems with coordinated profiling, workload distribution, and cost-aware orchestration. Use when improving agent performance, throughput, or reliability.
Comprehensive project planning and documentation generator for software projects. Creates structured requirements documents, system design documents, and task breakdown plans with implementation...
Expert in CrewAI - the leading role-based multi-agent framework used by 60% of Fortune 500 companies. Covers agent design with roles and goals, task definition, crew orchestration, process types...
Inter-agent communication patterns including message passing, shared memory, blackboard systems, and event-driven architectures for LLM agentsUse when "agent communication, message passing,...
Activate maximum performance mode with parallel agent orchestration for high-throughput task completion
Expert in LangGraph - the production-grade framework for building stateful, multi-actor AI applications. Covers graph construction, state management, cycles and branches, persistence with...