Search: model-evaluation | AgentSkillsRepo

model-evaluation 0.30

cosmix / loom-model-evaluation exact

Evaluates machine learning models for performance, fairness, and reliability using appropriate metrics and validation techniques. Covers training debugging, hyperparameter tuning, and production...

★ 6 ai

agentic-coding agents claude claude-code

Model Evaluator 0.25

eddiebe147 / claude-settings-model-evaluator exact

Evaluate and compare ML model performance with rigorous testing methodologies

★ 8 ai

evaluating-code-models 0.24

zechenzhangAGI / ai-research-skills-evaluating-code-models exact

Evaluates code generation models across HumanEval, MBPP, MultiPL-E, and 15+ benchmarks with pass@k metrics. Use when benchmarking code models, comparing coding abilities, testing multi-language...

★ 1,712 ai

ai ai-research claude claude-code

advanced-evaluation 0.21

Kalyanikhandare29 / agent-skills-for-context-engineering-advanced-evaluation exact

This skill should be used when the user asks to "implement LLM-as-judge", "compare model outputs", "create evaluation rubrics", "mitigate evaluation bias", or mentions direct scoring, pairwise...

★ 0 ai

accept-language agentic-context-enginnering ai ai-agents

evaluation 0.21

Kalyanikhandare29 / agent-skills-for-context-engineering-evaluation exact

This skill should be used when the user asks to "evaluate agent performance", "build test framework", "measure agent quality", "create evaluation rubrics", or mentions LLM-as-judge,...

★ 0 ai

accept-language agentic-context-enginnering ai ai-agents

hugging-face-evaluation 0.19

huggingface / skills-hugging-face-evaluation exact

Add and manage evaluation results in Hugging Face model cards. Supports extracting eval tables from README content, importing scores from Artificial Analysis API, and running custom model...

★ 1,015 ai

hugging-face-evaluation-manager 0.19

eugenepyvovarov / mcpbundler-agent-skills-marketplace-hugging-face-evaluation-manager exact

Add and manage evaluation results in Hugging Face model cards. Supports extracting eval tables from README content, importing scores from Artificial Analysis API, and running custom model...

★ 5 ai

agent-skill agent-skills claude codex

advanced-evaluation 0.19

muratcankoylan / agent-skills-for-context-engineering-advanced-evaluation exact

This skill should be used when the user asks to "implement LLM-as-judge", "compare model outputs", "create evaluation rubrics", "mitigate evaluation bias", or mentions direct scoring, pairwise...

★ 7,926 ai

advanced-evaluation 0.19

guanyang / antigravity-skills-advanced-evaluation exact

This skill should be used when the user asks to "implement LLM-as-judge", "compare model outputs", "create evaluation rubrics", "mitigate evaluation bias", or mentions direct scoring, pairwise...

★ 105 ai

ai-skills antigravity antigravity-ai antigravity-ide

evaluation 0.18

muratcankoylan / agent-skills-for-context-engineering-evaluation exact

This skill should be used when the user asks to "evaluate agent performance", "build test framework", "measure agent quality", "create evaluation rubrics", or mentions LLM-as-judge,...

★ 7,926 ai

evaluation 0.18

guanyang / antigravity-skills-evaluation exact

This skill should be used when the user asks to "evaluate agent performance", "build test framework", "measure agent quality", "create evaluation rubrics", or mentions LLM-as-judge,...

★ 105 ai

ai-skills antigravity antigravity-ai antigravity-ide

promptfoo-evaluation 0.18

daymade / claude-code-skills-promptfoo-evaluation exact

Configures and runs LLM evaluation using Promptfoo framework. Use when setting up prompt testing, creating evaluation configs (promptfooconfig.yaml), writing Python custom assertions, implementing...

★ 510 ai

advanced-evaluation 0.18

itsAR-VR / goatedskills-advanced-evaluation exact

Master LLM-as-a-Judge evaluation techniques including direct scoring, pairwise comparison, rubric generation, and bias mitigation. Use when building evaluation systems, comparing model outputs, or...

★ 0 ai

advanced-evaluation 0.18

shipshitdev / library-advanced-evaluation exact

Master LLM-as-a-Judge evaluation techniques including direct scoring, pairwise comparison, rubric generation, and bias mitigation. Use when building evaluation systems, comparing model outputs, or...

★ 4 ai

claude-code codex commands skills

model-comparison-tool 0.17

dkyazzentwatwa / chatgpt-skills-model-comparison-tool exact

Use when asked to compare multiple ML models, perform cross-validation, evaluate metrics, or select the best model for a classification/regression task.

★ 7 ai

chatgpt claude-skills

evaluating-llms-harness 0.17

zechenzhangAGI / ai-research-skills-evaluating-llms-harness exact

Evaluates LLMs across 60+ academic benchmarks (MMLU, HumanEval, GSM8K, TruthfulQA, HellaSwag). Use when benchmarking model quality, comparing models, reporting academic results, or tracking...

★ 1,712 ai

ai ai-research claude claude-code

evaluating-llms-harness 0.17

ovachiever / droid-tings-evaluating-llms-harness exact

Evaluates LLMs across 60+ academic benchmarks (MMLU, HumanEval, GSM8K, TruthfulQA, HellaSwag). Use when benchmarking model quality, comparing models, reporting academic results, or tracking...

★ 19 ai

business-model-auditor 0.16

shipshitdev / library-business-model-auditor exact

Use this skill when users need to stress test their business model, identify scale limitations, find bottlenecks, determine if they're trading time for money, or evaluate unit economics. Activates...

★ 4 development

claude-code codex commands skills

model-council 0.16

michaelboeding / skills-model-council exact

This skill should be used when the user asks for "model council", "multi-model", "compare models", "ask multiple AIs", "consensus across models", "run on different models", or wants to get...

★ 5 ai

evaluation 0.16

shipshitdev / library-evaluation exact

Build evaluation frameworks for agent systems. Use when testing agent performance, validating context engineering choices, or measuring improvements over time.

★ 4 ai

claude-code codex commands skills

Confirm

Submit a Skill