Search: evaluation | AgentSkillsRepo

evaluation 0.30

Kalyanikhandare29 / agent-skills-for-context-engineering-evaluation exact

This skill should be used when the user asks to "evaluate agent performance", "build test framework", "measure agent quality", "create evaluation rubrics", or mentions LLM-as-judge,...

★ 0 ai

accept-language agentic-context-enginnering ai ai-agents

evaluation 0.30

guanyang / antigravity-skills-evaluation exact

This skill should be used when the user asks to "evaluate agent performance", "build test framework", "measure agent quality", "create evaluation rubrics", or mentions LLM-as-judge,...

★ 105 ai

ai-skills antigravity antigravity-ai antigravity-ide

evaluation 0.30

muratcankoylan / agent-skills-for-context-engineering-evaluation exact

This skill should be used when the user asks to "evaluate agent performance", "build test framework", "measure agent quality", "create evaluation rubrics", or mentions LLM-as-judge,...

★ 7,926 ai

evaluation 0.27

shishiv / gsd-evaluation exact

Build evaluation frameworks for agent systems

★ 0 ai

evaluation 0.26

shipshitdev / library-evaluation exact

Build evaluation frameworks for agent systems. Use when testing agent performance, validating context engineering choices, or measuring improvements over time.

★ 4 ai

claude-code codex commands skills

evaluation 0.26

mjunaidca / mjs-agent-skills-evaluation exact

Build evaluation frameworks for agent systems. Use when testing agent performance, validating context engineering choices, or measuring improvements over time.

★ 19 ai

agent agent-skills agentskills ai-agents-for-business

evaluation 0.26

itsAR-VR / goatedskills-evaluation exact

Build evaluation frameworks for agent systems. Use when testing agent performance, validating context engineering choices, or measuring improvements over time.

★ 0 ai

model-evaluation 0.26

cosmix / loom-model-evaluation exact

Evaluates machine learning models for performance, fairness, and reliability using appropriate metrics and validation techniques. Covers training debugging, hyperparameter tuning, and production...

★ 6 ai

agentic-coding agents claude claude-code

advanced-evaluation 0.25

Kalyanikhandare29 / agent-skills-for-context-engineering-advanced-evaluation exact

This skill should be used when the user asks to "implement LLM-as-judge", "compare model outputs", "create evaluation rubrics", "mitigate evaluation bias", or mentions direct scoring, pairwise...

★ 0 ai

accept-language agentic-context-enginnering ai ai-agents

advanced-evaluation 0.25

guanyang / antigravity-skills-advanced-evaluation exact

This skill should be used when the user asks to "implement LLM-as-judge", "compare model outputs", "create evaluation rubrics", "mitigate evaluation bias", or mentions direct scoring, pairwise...

★ 105 ai

ai-skills antigravity antigravity-ai antigravity-ide

advanced-evaluation 0.25

muratcankoylan / agent-skills-for-context-engineering-advanced-evaluation exact

This skill should be used when the user asks to "implement LLM-as-judge", "compare model outputs", "create evaluation rubrics", "mitigate evaluation bias", or mentions direct scoring, pairwise...

★ 7,926 ai

promptfoo-evaluation 0.25

daymade / claude-code-skills-promptfoo-evaluation exact

Configures and runs LLM evaluation using Promptfoo framework. Use when setting up prompt testing, creating evaluation configs (promptfooconfig.yaml), writing Python custom assertions, implementing...

★ 510 ai

scholar-evaluation 0.25

ovachiever / droid-tings-scholar-evaluation exact

Systematic framework for evaluating scholarly and research work based on the ScholarEval methodology. This skill should be used when assessing research papers, evaluating literature reviews,...

★ 19 development

scholar-evaluation 0.25

jackspace / claudeskillz-scholar-evaluation exact

Systematic framework for evaluating scholarly and research work based on the ScholarEval methodology. This skill should be used when assessing research papers, evaluating literature reviews,...

★ 8 ai

agentic-coding ai-skills automation bioinformatics

llm-evaluation 0.24

halay08 / fullstack-agent-skills-llm-evaluation exact