Workflow automation is the infrastructure that makes AI agents reliable. Without durable execution, a network hiccup during a 10-step payment flow means lost money and angry customers. With it,...
Process external code review feedback with technical rigor. Use when receiving feedback from another LLM, human reviewer, or CI tool. Verifies claims before implementing, tracks disposition.
Implement comprehensive observability for LLM applications including tracing (Langfuse/Helicone), cost tracking, token optimization, RAG evaluation metrics (RAGAS), hallucination detection, and...
Expert in designing effective prompts for LLM-powered applications. Masters prompt structure, context management, output formatting, and prompt evaluation. Use when "prompt engineering, system...
Persistent memory systems for LLM conversations including short-term, long-term, and entity-based memory Use when: conversation memory, remember, memory persistence, long-term memory, chat history.
Persistent memory systems for LLM conversations including short-term, long-term, and entity-based memory Use when: conversation memory, remember, memory persistence, long-term memory, chat history.
Validates Terminal User Interface (TUI) output using freeze for screenshot capture and LLM-as-judge for semantic validation. Supports both visual (PNG/SVG) and text-based validation modes.
This skill should be used when the user asks to "implement agent memory", "persist state across sessions", "build knowledge graph", "track entities", or mentions memory architecture, temporal...
This skill should be used when the user asks to "implement LLM-as-judge", "compare model outputs", "create evaluation rubrics", "mitigate evaluation bias", or mentions direct scoring, pairwise...
Expert in building Retrieval-Augmented Generation systems. Masters embedding models, vector databases, chunking strategies, and retrieval optimization for LLM applications. Use when "building RAG,...
Clean code patterns for Azure AI Search Python SDK (azure-search-documents). Use when building search applications, creating/managing indexes, implementing agentic retrieval with knowledge bases,...
Create or refine well-formed backlog work items for AI-assisted development. Use when drafting a new item, refining an underspecified task, splitting a large task, or validating readiness before...
Detect and surface potential decision-making in work items, plans, diffs, docs, or summaries. Use to flag possible decisions neutrally without enforcing process.
Detect risk or uncertainty and pause execution. Use before or after plans, commands, diffs, or implementation to surface unclear requirements or risky actions.
Best practices for building a Stripe integrations
Execute a single, well-formed work item with minimal, verified changes. Use when a backlog item is ready for implementation and work must proceed safely without scope creep.
Flag possible documentation duplication, misplacement, or verbosity. Use when drafting or reviewing docs, backlog items, ADRs, or explanatory text to steer toward a single source of truth.
>
Braintrust tracing for Claude Code - hook architecture, sub-agent correlation, debugging
Expert in building voice AI applications - from real-time voice agents to voice-enabled apps. Covers OpenAI Realtime API, Vapi for voice agents, Deepgram for transcription, ElevenLabs for...