Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world...
Expert in Solidity smart contract development with security and gas optimization
Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world...
Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world...
Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world...
Plan features interactively. Asks clarifying questions, then generates a detailed PRD document.
Deep research via Gemini CLI — runs in background sub-agent.
This skill should be used when fixing bugs, implementing features, debugging issues, or making code changes. Ensures understanding of code flow before implementation by: (1) Tracing execution path...
Write and execute Python code to process data, analyze scraped content, or perform computations
Feynman Technique for deep learning—explain a concept simply, identify gaps, fill them, then refine. Use when learning something new, testing understanding, or preparing to teach.
GTD mentor for inbox processing, weekly reviews, and coaching. Triggers on "process inbox", "weekly review", "what should I do", "I'm stuck", or /gtd command.
Site reliability specialist for Prometheus metrics, distributed tracing, alerting strategies, and SLO designUse when "observability, monitoring, prometheus, grafana, alerting, slo, sli, metrics,...
Cynefin sense-making framework categorizing problems as Simple, Complicated, Complex, Chaotic, or Confused to select the right approach. Use when unsure how to tackle a problem.
Causal inference specialist for causal discovery, counterfactual reasoning, and effect estimationUse when "causal inference, causal discovery, counterfactual, intervention effect, confounder,...
>
Memory is the cornerstone of intelligent agents. Without it, every interaction starts from zero. This skill covers the architecture of agent memory: short-term (context window), long-term (vector...
Memory is the cornerstone of intelligent agents. Without it, every interaction starts from zero. This skill covers the architecture of agent memory: short-term (context window), long-term (vector...
Memory is the cornerstone of intelligent agents. Without it, every interaction starts from zero. This skill covers the architecture of agent memory: short-term (context window), long-term (vector...
Memory is the cornerstone of intelligent agents. Without it, every interaction starts from zero. This skill covers the architecture of agent memory: short-term (context window), long-term (vector...
Memory is the cornerstone of intelligent agents. Without it, every interaction starts from zero. This skill covers the architecture of agent memory: short-term (context window), long-term (vector...