40 results (59.4ms) page 2 / 2
Kalyanikhandare29 / agent-skills-for-context-engineering-advanced-evaluation exact

This skill should be used when the user asks to "implement LLM-as-judge", "compare model outputs", "create evaluation rubrics", "mitigate evaluation bias", or mentions direct scoring, pairwise...

guanyang / antigravity-skills-advanced-evaluation exact

This skill should be used when the user asks to "implement LLM-as-judge", "compare model outputs", "create evaluation rubrics", "mitigate evaluation bias", or mentions direct scoring, pairwise...

muratcankoylan / agent-skills-for-context-engineering-advanced-evaluation exact

This skill should be used when the user asks to "implement LLM-as-judge", "compare model outputs", "create evaluation rubrics", "mitigate evaluation bias", or mentions direct scoring, pairwise...

cosmix / loom-model-evaluation exact

Evaluates machine learning models for performance, fairness, and reliability using appropriate metrics and validation techniques. Covers training debugging, hyperparameter tuning, and production...

itsAR-VR / goatedskills-advanced-evaluation exact

Master LLM-as-a-Judge evaluation techniques including direct scoring, pairwise comparison, rubric generation, and bias mitigation. Use when building evaluation systems, comparing model outputs, or...

shipshitdev / library-advanced-evaluation exact

Master LLM-as-a-Judge evaluation techniques including direct scoring, pairwise comparison, rubric generation, and bias mitigation. Use when building evaluation systems, comparing model outputs, or...

omer-metin / skills-for-antigravity-agent-evaluation exact

Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world...

liqiongyu / lenny-skills-plus-evaluating-candidates exact

Make an evidence-based hiring decision and produce a Candidate Evaluation Decision Pack (criteria + scorecard, signal log, work sample/trial plan + rubric, reference check script + summary,...

huggingface / skills-hugging-face-evaluation exact

Add and manage evaluation results in Hugging Face model cards. Supports extracting eval tables from README content, importing scores from Artificial Analysis API, and running custom model...

OmidZamani / dspy-skills-dspy-evaluation-suite exact

This skill should be used when the user asks to "evaluate a DSPy program", "test my DSPy module", "measure performance", "create evaluation metrics", "use answer_exact_match or SemanticF1",...

omer-metin / skills-for-antigravity-decision-frameworks exact

Expert in decision-making frameworks - systematic approaches to making better decisions under uncertainty. Covers decision criteria, reversibility assessment, stakeholder alignment, and decision...

delineas / astro-framework-agents-astro-framework exact

Comprehensive Astro framework development guide for building fast, content-driven websites using islands architecture. Use this skill when creating Astro components, implementing islands with...

RefoundAI / lenny-skills-evaluating-new-technology exact

Help users evaluate emerging technologies. Use when someone is assessing new tools, making build vs buy decisions, evaluating AI vendors, or deciding on technical architecture.

eugenepyvovarov / mcpbundler-agent-skills-marketplace-hugging-face-evaluation-manager exact

Add and manage evaluation results in Hugging Face model cards. Supports extracting eval tables from README content, importing scores from Artificial Analysis API, and running custom model...

liqiongyu / lenny-skills-plus-evaluating-trade-offs exact

Evaluate trade-offs and produce a Trade-off Evaluation Pack (trade-off brief, options+criteria matrix, all-in cost/opportunity cost table, impact ranges, recommendation, stop/continue triggers)....

eval 0.19
mikeyobrien / ralph-orchestrator-eval exact

EvalKit is a conversational evaluation framework for AI agents that guides you through creating robust evaluations using the Strands Evals SDK. Through natural conversation, you can plan...

404kidwiz / claude-supercode-skills-dotnet-framework-4-8-expert exact

Legacy .NET Framework expert specializing in .NET Framework 4.8, WCF services, ASP.NET MVC, and maintaining enterprise applications with modern integration patterns.

eddiebe147 / claude-settings-model-evaluator exact

Evaluate and compare ML model performance with rigorous testing methodologies

RefoundAI / lenny-skills-evaluating-candidates exact

Help users make better hiring decisions. Use when someone is evaluating job candidates, making hiring decisions, conducting reference checks, reviewing work samples or take-homes, calibrating...

aj-geddes / useful-ai-prompts-test-automation-framework exact

Design and implement scalable test automation frameworks with Page Object Model, fixtures, and reporting. Use for test framework, page object pattern, test architecture, test organization, and...