Search: evaluation-framework

advanced-evaluation 0.22

Kalyanikhandare29 / agent-skills-for-context-engineering-advanced-evaluation exact

This skill should be used when the user asks to "implement LLM-as-judge", "compare model outputs", "create evaluation rubrics", "mitigate evaluation bias", or mentions direct scoring, pairwise...

★ 0 ai

accept-language agentic-context-enginnering ai ai-agents

advanced-evaluation 0.22

guanyang / antigravity-skills-advanced-evaluation exact

This skill should be used when the user asks to "implement LLM-as-judge", "compare model outputs", "create evaluation rubrics", "mitigate evaluation bias", or mentions direct scoring, pairwise...

★ 105 ai

ai-skills antigravity antigravity-ai antigravity-ide

advanced-evaluation 0.22

muratcankoylan / agent-skills-for-context-engineering-advanced-evaluation exact

This skill should be used when the user asks to "implement LLM-as-judge", "compare model outputs", "create evaluation rubrics", "mitigate evaluation bias", or mentions direct scoring, pairwise...

★ 7,926 ai

model-evaluation 0.21

cosmix / loom-model-evaluation exact

Evaluates machine learning models for performance, fairness, and reliability using appropriate metrics and validation techniques. Covers training debugging, hyperparameter tuning, and production...

★ 6 ai

agentic-coding agents claude claude-code

advanced-evaluation 0.21

itsAR-VR / goatedskills-advanced-evaluation exact

Master LLM-as-a-Judge evaluation techniques including direct scoring, pairwise comparison, rubric generation, and bias mitigation. Use when building evaluation systems, comparing model outputs, or...

★ 0 ai

advanced-evaluation 0.20

shipshitdev / library-advanced-evaluation exact

Master LLM-as-a-Judge evaluation techniques including direct scoring, pairwise comparison, rubric generation, and bias mitigation. Use when building evaluation systems, comparing model outputs, or...

★ 4 ai

claude-code codex commands skills

agent-evaluation 0.20

omer-metin / skills-for-antigravity-agent-evaluation exact

Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world...

★ 5 ai

ai-agents antigravity antigravity-ide skills

evaluating-candidates 0.20

liqiongyu / lenny-skills-plus-evaluating-candidates exact

Make an evidence-based hiring decision and produce a Candidate Evaluation Decision Pack (criteria + scorecard, signal log, work sample/trial plan + rubric, reference check script + summary,...

★ 14 ai

agent-skills ai-agents automation claude

hugging-face-evaluation 0.20

huggingface / skills-hugging-face-evaluation exact

Add and manage evaluation results in Hugging Face model cards. Supports extracting eval tables from README content, importing scores from Artificial Analysis API, and running custom model...

★ 1,015 ai

dspy-evaluation-suite 0.19

OmidZamani / dspy-skills-dspy-evaluation-suite exact

This skill should be used when the user asks to "evaluate a DSPy program", "test my DSPy module", "measure performance", "create evaluation metrics", "use answer_exact_match or SemanticF1",...

★ 20 ai

agent-skills claude-code claude-skills dspy

decision-frameworks 0.19

omer-metin / skills-for-antigravity-decision-frameworks exact

Expert in decision-making frameworks - systematic approaches to making better decisions under uncertainty. Covers decision criteria, reversibility assessment, stakeholder alignment, and decision...

★ 5 ai

ai-agents antigravity antigravity-ide skills

astro-framework 0.19

delineas / astro-framework-agents-astro-framework exact

Comprehensive Astro framework development guide for building fast, content-driven websites using islands architecture. Use this skill when creating Astro components, implementing islands with...

★ 0 development

evaluating-new-technology 0.19

RefoundAI / lenny-skills-evaluating-new-technology exact

Help users evaluate emerging technologies. Use when someone is assessing new tools, making build vs buy decisions, evaluating AI vendors, or deciding on technical architecture.

★ 30 ai

ai-agents ai-assistant claude claude-code

hugging-face-evaluation-manager 0.19

eugenepyvovarov / mcpbundler-agent-skills-marketplace-hugging-face-evaluation-manager exact

Add and manage evaluation results in Hugging Face model cards. Supports extracting eval tables from README content, importing scores from Artificial Analysis API, and running custom model...

★ 5 ai

agent-skill agent-skills claude codex

evaluating-trade-offs 0.19

liqiongyu / lenny-skills-plus-evaluating-trade-offs exact

Evaluate trade-offs and produce a Trade-off Evaluation Pack (trade-off brief, options+criteria matrix, all-in cost/opportunity cost table, impact ranges, recommendation, stop/continue triggers)....

★ 14 ai

agent-skills ai-agents automation claude

eval 0.19

mikeyobrien / ralph-orchestrator-eval exact

EvalKit is a conversational evaluation framework for AI agents that guides you through creating robust evaluations using the Strands Evals SDK. Through natural conversation, you can plan...

★ 1,473 ai

ai ai-agents ai-agents-framework ai-developer-tools

dotnet-framework-4.8-expert 0.18

404kidwiz / claude-supercode-skills-dotnet-framework-4-8-expert exact

Legacy .NET Framework expert specializing in .NET Framework 4.8, WCF services, ASP.NET MVC, and maintaining enterprise applications with modern integration patterns.

★ 6 ai

Model Evaluator 0.18

eddiebe147 / claude-settings-model-evaluator exact

Evaluate and compare ML model performance with rigorous testing methodologies

★ 8 ai

evaluating-candidates 0.18

RefoundAI / lenny-skills-evaluating-candidates exact

Help users make better hiring decisions. Use when someone is evaluating job candidates, making hiring decisions, conducting reference checks, reviewing work samples or take-homes, calibrating...

★ 30 ai

ai-agents ai-assistant claude claude-code

test-automation-framework 0.18

aj-geddes / useful-ai-prompts-test-automation-framework exact

Design and implement scalable test automation frameworks with Page Object Model, fixtures, and reporting. Use for test framework, page object pattern, test architecture, test organization, and...

★ 55 ai

Confirm

Submit a Skill