This skill should be used when the user asks to "evaluate agent performance", "build test framework", "measure agent quality", "create evaluation rubrics", or mentions LLM-as-judge,...
Create structured podcast episodes. Segment timing, debate points, hot takes, listener questions, ad break placement.
Individualized development plans by weakness. Skill assessments, call reviews, ride-along checklists, certification tracking.
Use when writing E2E tests with Playwright, setting up test infrastructure, or debugging flaky browser tests. Invoke for browser automation, E2E tests, Page Object Model, test flakiness, visual testing.
Use when writing E2E tests with Playwright, setting up test infrastructure, or debugging flaky browser tests. Invoke for browser automation, E2E tests, Page Object Model, test flakiness, visual testing.
You are a Rust project architecture expert specializing in scaffolding production-ready Rust applications. Generate complete project structures with cargo tooling, proper module organization, testing
You are a Rust project architecture expert specializing in scaffolding production-ready Rust applications. Generate complete project structures with cargo tooling, proper module organization, testing
You are a Rust project architecture expert specializing in scaffolding production-ready Rust applications. Generate complete project structures with cargo tooling, proper module organization, testing
You are a Rust project architecture expert specializing in scaffolding production-ready Rust applications. Generate complete project structures with cargo tooling, proper module organization, testing
Use when implementing OCI GenAI inference APIs, troubleshooting rate limits or token errors, optimizing GenAI costs, or handling sensitive data (PHI/PII) in prompts. Covers model selection, cost...
Scaffold a production-ready full-stack monorepo with working MVP features, tests, and CI/CD. Generates complete CRUD functionality, Clerk authentication, and quality gates that run immediately...
Unified full-spectrum testing for websites and applications. Maps sites, spawns parallel testers, analyzes failures, auto-fixes issues, generates comprehensive reports.
EvalKit is a conversational evaluation framework for AI agents that guides you through creating robust evaluations using the Strands Evals SDK. Through natural conversation, you can plan...
Generates test scenarios and adversarial attacks for Automotive Security (UDS, SecOC, Secure Boot).
Debug regex patterns with visual breakdowns, plain English explanations, test case generation, and flavor conversion. Use when user needs help with regular expressions or pattern matching.
Automated Dynamic Application Security Testing (DAST) using Playwright MCP for browser-based security scanning. Performs blackbox/greybox security testing on single or multiple domains with...
Plan-spec-implement workflow for structured development. Only use when explicitly directed by user or when mentioned in project AGENTS.md file. Generates ephemeral plans in ~/.dot-agent/, applies...
Use when verifying alignment between directives and their implementations (scripts OR plans). Detects when specifications drift from implementations. Triggers include "check contract", "verify...
Create story examples for components. Use when writing stories, creating examples, or demonstrating component usage.
根据培训主题和听众特征生成Word格式的详细培训材料。适用于各类培训场景,包括企业内训、技能培训、知识普及等。