Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world...
Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world...
Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world...
Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world...
Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world...
World-class ML engineering skill for productionizing ML models, MLOps, and building scalable ML systems. Expertise in PyTorch, TensorFlow, model deployment, feature stores, model monitoring, and...
Expert sales enablement strategist for building high-performing sales teams. Use when designing sales training programs, onboarding and ramp plans, sales playbooks, coaching frameworks,...
This skill should be used when users need to work with Expo SDK and Expo Router for building React Native applications. It provides comprehensive guidance on navigation patterns, media handling...
This skill should be used when the user asks to "create custom DSPy module", "design a DSPy module", "extend dspy.Module", "build reusable DSPy component", mentions "custom module patterns",...
Use when validating product assumptions before building, discovering unmet user needs, understanding customer problems and workflows, testing concepts or positioning, researching target markets,...
Complete React + Vite expertise for building optimized, scalable applications. Covers project architecture, folder structure, component patterns, performance optimization, TypeScript best...
Expert FastAPI developer specializing in production-ready async REST APIs with Pydantic v2, SQLAlchemy 2.0, OAuth2/JWT authentication, and comprehensive security. Deep expertise in dependency...
Expert product demonstration specialist for SaaS and B2B software. Use when preparing demos, structuring demo presentations, tailoring to stakeholders, handling objections during demos, managing...
Comprehensive Azure cloud expertise covering all major services (App Service, Functions, Container Apps, AKS, databases, storage, monitoring). Use when working with Azure infrastructure,...
When the user wants to add, fix, or optimize schema markup and structured data on their site. Also use when the user mentions "schema markup," "structured data," "JSON-LD," "rich snippets,"...
Create and manage production Grafana dashboards for real-time visualization of system and application metrics. Use when building monitoring dashboards, visualizing metrics, or creating operational...
Create and manage production Grafana dashboards for real-time visualization of system and application metrics. Use when building monitoring dashboards, visualizing metrics, or creating operational...
Create and manage production Grafana dashboards for real-time visualization of system and application metrics. Use when building monitoring dashboards, visualizing metrics, or creating operational...
Create and manage production Grafana dashboards for real-time visualization of system and application metrics. Use when building monitoring dashboards, visualizing metrics, or creating operational...
Use when evaluating business model viability, analyzing profitability per customer/product/transaction, validating startup metrics (CAC, LTV, payback period), making pricing decisions, assessing...