Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world...
Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world...
Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world...
Guide for designing and writing ap-* agent specifications. Use when creating a new agent, drafting agent frontmatter, defining agent workflows, or structuring agent collaborations. For...
Multi-agent orchestration patterns. Use when multiple independent tasks can run with different domain expertise or when comprehensive analysis requires multiple perspectives.
This skill should be used when the user asks to "build background agent", "create hosted coding agent", "set up sandboxed execution", "implement multiplayer agent", or mentions background agents,...
Voice agents represent the frontier of AI interaction - humans speaking naturally with AI systems. The challenge isn't just speech recognition and synthesis, it's achieving natural conversation...
Create and configure Claude Code sub-agents with custom prompts, tools, and models
Planning agent that creates implementation plans and handoffs from conversation context
Validation agent that validates plan tech choices against current best practices
Research agent for external documentation, best practices, and library APIs via MCP tools
Creates and modifies Claude Code sub-agents following best practices. Use when user requests creating, updating, modifying, improving, or editing sub-agents. Triggers include "create/make/new...
Meta-agent for creating new custom agents, skills, and MCP integrations. Expert in agent design, MCP development, skill architecture, and rapid prototyping. Activate on 'create agent', 'new...
List all available agents (core + expert)
Create OpenAI Agents SDK applications in TypeScript/JavaScript. Use when building AI agents, multi-agent systems, voice agents, or any agentic workflow with the OpenAI Agents SDK. Covers agents,...
Agent Context Isolation
Background Agent Pings
Draw diagrams, flowcharts, and visualizations on an Excalidraw canvas. Use when the user asks to draw, visualize, create diagrams, or sketch ideas.
Expert in designing, orchestrating, and managing multi-agent systems (MAS). Specializes in agent collaboration patterns, hierarchical structures, and swarm intelligence. Use when building agent...
Autonomous AI agent platform for building and deploying continuous agents. Use when creating visual workflow agents, deploying persistent autonomous agents, or building complex multi-step AI...