40 results (65.2ms) page 1 / 2
guanyang / antigravity-skills-evaluation exact

This skill should be used when the user asks to "evaluate agent performance", "build test framework", "measure agent quality", "create evaluation rubrics", or mentions LLM-as-judge,...

Kalyanikhandare29 / agent-skills-for-context-engineering-evaluation exact

This skill should be used when the user asks to "evaluate agent performance", "build test framework", "measure agent quality", "create evaluation rubrics", or mentions LLM-as-judge,...

muratcankoylan / agent-skills-for-context-engineering-evaluation exact

This skill should be used when the user asks to "evaluate agent performance", "build test framework", "measure agent quality", "create evaluation rubrics", or mentions LLM-as-judge,...

Przemocny / strategic-frameworks-use-framework exact

Apply strategic frameworks through facilitated workshop dialogue. Use when user selected framework via choose-framework; explicitly requests specific framework; knows which framework to apply; or...

Przemocny / strategic-frameworks-discover-framework exact

Research and add new strategic frameworks to the system (meta-skill). Use when user wants to add framework not in library; discovered new framework in their domain; asks "Can you add...

shipshitdev / library-evaluation exact

Build evaluation frameworks for agent systems. Use when testing agent performance, validating context engineering choices, or measuring improvements over time.

itsAR-VR / goatedskills-evaluation exact

Build evaluation frameworks for agent systems. Use when testing agent performance, validating context engineering choices, or measuring improvements over time.

mjunaidca / mjs-agent-skills-evaluation exact

Build evaluation frameworks for agent systems. Use when testing agent performance, validating context engineering choices, or measuring improvements over time.

jackspace / claudeskillz-scholar-evaluation exact

Systematic framework for evaluating scholarly and research work based on the ScholarEval methodology. This skill should be used when assessing research papers, evaluating literature reviews,...

ovachiever / droid-tings-scholar-evaluation exact

Systematic framework for evaluating scholarly and research work based on the ScholarEval methodology. This skill should be used when assessing research papers, evaluating literature reviews,...

Przemocny / strategic-frameworks-choose-framework exact

Select the right strategic framework for your situation through exploratory dialogue. Use when user describes a problem, decision, or challenge; needs structured thinking approach; mentions...

daymade / claude-code-skills-promptfoo-evaluation exact

Configures and runs LLM evaluation using Promptfoo framework. Use when setting up prompt testing, creating evaluation configs (promptfooconfig.yaml), writing Python custom assertions, implementing...

halay08 / fullstack-agent-skills-llm-evaluation exact

Implement comprehensive evaluation strategies for LLM applications using automated metrics, human feedback, and benchmarking. Use when testing LLM performance, measuring AI application quality, or...

rmyndharis / antigravity-skills-llm-evaluation exact

Implement comprehensive evaluation strategies for LLM applications using automated metrics, human feedback, and benchmarking. Use when testing LLM performance, measuring AI application quality, or...

404kidwiz / agent-skills-backup-llm-evaluation exact

Implement comprehensive evaluation strategies for LLM applications using automated metrics, human feedback, and benchmarking. Use when testing LLM performance, measuring AI application quality, or...

shishiv / gsd-llm-evaluation exact

Implement comprehensive evaluation strategies for LLM applications using automated metrics, human feedback, and benchmarking. Use when testing LLM performance, measuring AI application quality, or...

ovachiever / droid-tings-llm-evaluation exact

Implement comprehensive evaluation strategies for LLM applications using automated metrics, human feedback, and benchmarking. Use when testing LLM performance, measuring AI application quality, or...

lyndonkl / claude-evaluation-rubrics exact

Use when need explicit quality criteria and scoring scales to evaluate work consistently, compare alternatives objectively, set acceptance thresholds, reduce subjective bias, or when user mentions...

K-Dense-AI / claude-scientific-skills-scholar-evaluation exact

Systematically evaluate scholarly work using the ScholarEval framework, providing structured assessment across research quality dimensions including problem formulation, methodology, analysis, and...