Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world...
Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world...
Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world...
Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world...
Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world...
LLM and AI application security testing skill for prompt injection, jailbreaking, and AI system vulnerabilities. This skill should be used when testing AI/ML applications for security issues,...
Quality assurance expert for testing strategies and quality gates. Use when planning test coverage, setting up QA processes, or improving quality standards.
Advanced swarm orchestration patterns for research, development, testing, and complex distributed workflows
Build production CI/CD pipelines with GitHub Actions. Implements matrix builds, caching, deployments, testing, security scanning. Use for automated testing, deployments, release workflows....
Automated Dynamic Application Security Testing (DAST) using Playwright MCP for browser-based security scanning. Performs blackbox/greybox security testing on single or multiple domains with...
Comprehensive quality assurance combining testing strategy, code quality enforcement, and validation gates. Consolidated from testing-strategist, code-quality-enforcer, and validation-gate-checker.
Vitest-specific testing utilities, mocking, and assertion patterns. Extends platform-testing with Vitest rules. Use when writing tests with Vitest.
Automate QA regression testing with reusable test skills. Create login flows, dashboard checks, user creation, and other common test scenarios that run consistently.
Automate QA regression testing with reusable test skills. Create login flows, dashboard checks, user creation, and other common test scenarios that run consistently.
Automate QA regression testing with reusable test skills. Create login flows, dashboard checks, user creation, and other common test scenarios that run consistently.
Resilience testing specialist for failure injection, game day planning, and building confidence in system reliabilityUse when "chaos engineering, resilience testing, failure injection, game day,...
Designs and implements CI/CD pipelines for automated testing, building, deployment, and security scanning across multiple platforms. Covers pipeline optimization, test integration, artifact...
Vitest fast unit testing framework powered by Vite with Jest-compatible API. Use when writing tests, mocking, configuring coverage, or working with test filtering and fixtures.
The intersection of AI generation and performance marketing. This skill covers creating ad creatives at scale using AI tools—from static images to video ads to dynamic creative optimization—while...
Comprehensive GitHub release orchestration with AI swarm coordination for automated versioning, testing, deployment, and rollback management