Build evaluation frameworks for agent systems. Use when testing agent performance, validating context engineering choices, or measuring improvements over time.
Build evaluation frameworks for agent systems. Use when testing agent performance, validating context engineering choices, or measuring improvements over time.
Build evaluation frameworks for agent systems. Use when testing agent performance, validating context engineering choices, or measuring improvements over time.
This skill should be used when the user asks to "evaluate agent performance", "build test framework", "measure agent quality", "create evaluation rubrics", or mentions LLM-as-judge,...
This skill should be used when the user asks to "evaluate agent performance", "build test framework", "measure agent quality", "create evaluation rubrics", or mentions LLM-as-judge,...
This skill should be used when the user asks to "evaluate agent performance", "build test framework", "measure agent quality", "create evaluation rubrics", or mentions LLM-as-judge,...
Build evaluation frameworks for agent systems
Select optimal LLM(s) for a task based on skill requirements, budget, and constraints. Uses the `which-llm` CLI to query Artificial Analysis benchmarks enriched with capability data from models.dev.
Comprehensive guide for Dependency-Track - Software Composition Analysis (SCA) and SBOM management platform. USE WHEN deploying Dependency-Track, integrating with CI/CD pipelines, configuring...
Control Spotify playback and manage playlists via MCP server. Use when user requests playing music, controlling Spotify, creating playlists, searching songs, or managing their Spotify library.
Validates OSCAL System Security Plan documents against schemas, profiles, and cross-reference requirements with tiered validation depth.
Design Redis architectures with caching patterns, data structures, eviction policies, persistence (RDB/AOF), and high availability (Sentinel/Cluster).
Comprehensive guide for building Solana apps with @solana/kit (web3.js 2.0). Use when you need modern RPC/subscriptions, transaction building, signing, and program interactions in JavaScript/TypeScript.
Analyzes and optimizes frontend performance using Core Web Vitals, bundle analysis, lazy loading, image optimization, and caching strategies
Build Python agents with Agentica SDK - @agentic decorator, spawn(), persistence, MCP integration
Build Python agents with Agentica SDK - @agentic decorator, spawn(), persistence, MCP integration
Multi-person projects - shared state, todo claiming, handoffs
Generate technical documentation, API docs, and content with accessibility and SEO optimization.
Generate and validate Kubernetes YAML manifests with best practices for Deployments, Services, ConfigMaps, and security policies.
Find relevant items under uncertainty across repositories, databases, web sources, or any searchable corpus. Use when exploring unknown territory, finding related information, or discovering...