Build Shopify applications, extensions, and themes using GraphQL/REST APIs, Shopify CLI, Polaris UI components, and Liquid templating. Capabilities include app development with OAuth...
Google Agent Development Kit (ADK) for Python. Capabilities: AI agent building, multi-agent systems, workflow agents (sequential/parallel/loop), tool integration (Google Search, Code Execution),...
Execute quality checklists (112+ items) for code review, testing strategy, and debugging. CHECKER mode audits QA practices with evidence tables. APPLIER mode generates test cases (5:1 dirty...
Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world...
Complete browser automation and web testing with Playwright. Auto-detects dev servers, manages server lifecycle, writes clean test scripts to /tmp. Test pages, fill forms, take screenshots, check...
Provides Playwright test patterns for resilient locators, Page Object Models, fixtures, web-first assertions, and network mocking. Must use when writing or modifying Playwright tests (.spec.ts,...
Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world...
Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world...
Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world...
Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world...
Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world...
Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world...
Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world...
Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world...
Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world...
Browser automation powers web testing, scraping, and AI agent interactions. The difference between a flaky script and a reliable system comes down to understanding selectors, waiting strategies,...
Guide for writing, refactoring, and testing MoonBit projects. Use when working in MoonBit modules or packages, organizing MoonBit files, using moon tooling (build/check/run/test/doc/ide etc.), or...
Angular is opinionated and comprehensive - it gives you everything: routing, forms, HTTP, dependency injection, testing. The learning curve is steep, but once you're in, you move fast. The...
LLM and AI application security testing skill for prompt injection, jailbreaking, and AI system vulnerabilities. This skill should be used when testing AI/ML applications for security issues,...
Use when CI tests fail on main branch after PR merge, or when investigating flaky test failures in CI environments