Implement comprehensive evaluation strategies for LLM applications using automated metrics, human feedback, and benchmarking. Use when testing LLM performance, measuring AI application quality, or...
End-to-End Testing Framework skill - Browser automation, API testing, performance benchmarking, test report generation, and chaos engineering basics. Use for comprehensive application testing.
Implement comprehensive evaluation strategies for LLM applications using automated metrics, human feedback, and benchmarking. Use when testing LLM performance, measuring AI application quality, or...
Implement comprehensive evaluation strategies for LLM applications using automated metrics, human feedback, and benchmarking. Use when testing LLM performance, measuring AI application quality, or...
Implement comprehensive evaluation strategies for LLM applications using automated metrics, human feedback, and benchmarking. Use when testing LLM performance, measuring AI application quality, or...
Implement comprehensive evaluation strategies for LLM applications using automated metrics, human feedback, and benchmarking. Use when testing LLM performance, measuring AI application quality, or...
CRITICAL: Use for performance optimization. Triggers: performance, optimization, benchmark, profiling, flamegraph, criterion, slow, fast, allocation, cache, SIMD, make it faster, ζ§θ½δΌε, εΊεζ΅θ―
Analyzes and optimizes Android app performance. Use when identifying UI jank, memory leaks, slow startup, high battery drain, or Compose recomposition issues. Covers profiling tools, benchmarks,...
Profile application performance, identify bottlenecks, and optimize hot paths using CPU profiling, flame graphs, and benchmarking. Use when investigating performance issues or optimizing critical...
SkillsBench contribution workflow. Use when: (1) Creating benchmark tasks, (2) Understanding repo structure, (3) Preparing PRs for task submission.
Expert performance optimizer using ALL MCP servers. Uses MongoDB for metrics, UltraThink for analysis, Memory for benchmarks, and search MCPs for optimization techniques.
Competitive intelligence gathering using anysite MCP server across LinkedIn, social media, Y Combinator, and the web. Track competitor activities, analyze hiring patterns, monitor content...
Use when designing multi-tenant OCI environments, setting up production landing zones, implementing compartment hierarchies, or establishing governance foundations. Covers Landing Zone reference...
Rocky Linux 8/9 security hardening including CIS benchmarks with OpenSCAP, SSH hardening, fail2ban, auditd rules, PAM configuration with authselect, and system-wide crypto policies. Use when...
Multi-dimensional rep evaluation: activity, conversion, velocity, deal size. Peer benchmarking and coaching priority identification.
Advanced test optimization with cargo-nextest, property testing, and performance benchmarking. Use when optimizing test execution speed, implementing property-based tests, or analyzing test performance.
Use this skill when users need to analyze competitors, monitor market movements, benchmark features/pricing, identify market gaps, or understand competitive positioning. Activates for "what are...
Spatial indexing and world streaming for Three.js building games with thousands of pieces. Use when optimizing building games, implementing spatial queries, chunk loading, or profiling...
Create an AI Evals Pack (eval PRD, test set, rubric, judge plan, results + iteration loop). Use for LLM evaluation, benchmarks, rubrics, error analysis/open coding, and ship/no-ship quality gates...
Create user-centered, accessible interface copy (microcopy) for digital products including buttons, labels, error messages, notifications, forms, onboarding, empty states, success messages, and...