This skill should be used when the user asks to "implement LLM-as-judge", "compare model outputs", "create evaluation rubrics", "mitigate evaluation bias", or mentions direct scoring, pairwise...
This skill should be used when the user asks to "diagnose context problems", "fix lost-in-middle issues", "debug agent failures", "understand context poisoning", or mentions context degradation,...
Audit AI systems for safety, bias, and responsible deployment
>
Control LLM output with regex and grammars, guarantee valid JSON/XML/code generation, enforce structured formats, and build multi-step workflows with Guidance - Microsoft Research's constrained...
Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, artifacts, posters, or applications (examples...
Generate test scaffolding for modules with proper structure, fixtures,
Guides creation, validation, and application of Supabase database migrations with RLS policy checks and type generation. Use when adding tables, modifying schema, or updating database structure.
Generate CHANGELOG entries following conventional commits format with
Generate engaging, viral-potential video titles from SRT subtitle files
|
Auto-activates when generating Product Requirements Prompt (PRP) documents
Use when asked to generate legal contracts, agreements, or documents from templates with variable substitution and formatting.
Generate comprehensive test suites including unit tests, integration tests, and E2E tests
Master the art of "vibe coding" - creating playable games through natural language prompts to AI. Covers effective prompting strategies, framework choices, workflow patterns, and avoiding common...
Use when asked to generate UUIDs, GUIDs, unique identifiers in various formats (UUID1, UUID4, etc.).
Generate Jest-based unit tests for JavaScript/TypeScript code. Creates
Generate AI-powered podcast-style audio narratives using Azure OpenAI's GPT Realtime Mini model via WebSocket. Use when building text-to-speech features, audio narrative generation, podcast...
Evaluates code generation models across HumanEval, MBPP, MultiPL-E, and 15+ benchmarks with pass@k metrics. Use when benchmarking code models, comparing coding abilities, testing multi-language...
Use when new translation keys are added to packages to generate new translations strings