Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world...
Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world...
Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world...
Browser automation powers web testing, scraping, and AI agent interactions. The difference between a flaky script and a reliable system comes down to understanding selectors, waiting strategies,...
LLM and AI application security testing skill for prompt injection, jailbreaking, and AI system vulnerabilities. This skill should be used when testing AI/ML applications for security issues,...
Browser automation for E2E testing. Use when testing user journeys, verifying UI behavior, or running end-to-end tests.
Complete Claude Code hooks reference - input/output schemas, registration, testing patterns
Advanced swarm orchestration patterns for research, development, testing, and complex distributed workflows
Comprehensive GitHub release orchestration with AI swarm coordination for automated versioning, testing, deployment, and rollback management
Test-Driven Development (TDD) specialist enforcing write-tests-first methodology. MUST USE when: fixing bugs (버그 수정), implementing new features (기능 구현), refactoring code, '/fix-issue' invoked,...
Automated Dynamic Application Security Testing (DAST) using Playwright MCP for browser-based security scanning. Performs blackbox/greybox security testing on single or multiple domains with...
Vitest-specific testing utilities, mocking, and assertion patterns. Extends platform-testing with Vitest rules. Use when writing tests with Vitest.
Vitest testing framework: Vite-powered tests, Jest-compatible API, mocking, snapshots, coverage, browser mode, and TypeScript support.
Install, configure, and use the Databricks CLI to manage workspaces and resources from the terminal or scripts. Covers installation (Homebrew, WinGet, curl, source), authentication (OAuth U2M and...
Automate QA regression testing with reusable test skills. Create login flows, dashboard checks, user creation, and other common test scenarios that run consistently.
Automate QA regression testing with reusable test skills. Create login flows, dashboard checks, user creation, and other common test scenarios that run consistently.
Resilience testing specialist for failure injection, game day planning, and building confidence in system reliabilityUse when "chaos engineering, resilience testing, failure injection, game day,...
Implement continuous AI iteration loops for complex development tasks. Use when building features requiring test-driven refinement, implementing tasks with clear success criteria, or automating...
The intersection of AI generation and performance marketing. This skill covers creating ad creatives at scale using AI tools—from static images to video ads to dynamic creative optimization—while...
Parallel task orchestration CLI that dispatches work to AI workers (via Claude Code) in isolated git workspaces. Use when the user wants to draft, create, run, or manage tasks, delegate tasks to...