Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world...
Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world...
Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world...
Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world...
Agent Context Isolation
Draw diagrams, flowcharts, and visualizations on an Excalidraw canvas. Use when the user asks to draw, visualize, create diagrams, or sketch ideas.
Multi-agent orchestration patterns. Use when multiple independent tasks can run with different domain expertise or when comprehensive analysis requires multiple perspectives.
This skill should be used when the user asks to "build background agent", "create hosted coding agent", "set up sandboxed execution", "implement multiplayer agent", or mentions background agents,...
Voice agents represent the frontier of AI interaction - humans speaking naturally with AI systems. The challenge isn't just speech recognition and synthesis, it's achieving natural conversation...
Create and configure Claude Code sub-agents with custom prompts, tools, and models
Planning agent that creates implementation plans and handoffs from conversation context
Validation agent that validates plan tech choices against current best practices
Research agent for external documentation, best practices, and library APIs via MCP tools
Configure AI coding agents to be honest, objective, and non-sycophantic. Use when the user wants to set up honest feedback, disable people-pleasing behavior, enable objective criticism, or...
Review and improve AI agent instruction documents (AGENTS.md, Claude.md, etc.) for quality, clarity, and effectiveness. Use when users request review of agent documentation, ask to evaluate...
Meta-agent for creating new custom agents, skills, and MCP integrations. Expert in agent design, MCP development, skill architecture, and rapid prototyping. Activate on 'create agent', 'new...
Create, update, and organize Claude Code rules (.claude/rules/) following official best practices. Use when "create rule", "add rule", "ルール作成", "ルール追加", or configuring path-specific conventions.
List all available agents (core + expert)
Background Agent Pings
Guides ecosystem-level refactors of ap-* agents. Use when agents overlap, responsibilities are unclear, or you need to merge, split, rename, or re-scope agents and formalize collaboration contracts.