Meta-prompting framework for critiquing responses, analyzing solution trajectories, and evaluating AI-generated content quality
A formal evaluation framework for Claude Code sessions, implementing eval-driven development (EDD) principles.
Formal evaluation framework for Claude Code sessions implementing eval-driven development (EDD) principles
Formal evaluation framework for Claude Code sessions implementing eval-driven development (EDD) principles
Generate, evaluate, and iterate on agent skills using HuggingFace's Upskill tool. Transfer domain expertise from frontier models to smaller/local models.
Comprehensive quality auditing and evaluation of tools, frameworks, and systems against industry best practices with detailed scoring across 12 critical dimensions
Explore candidate solutions before committing. Use when you have a problem statement and need to evaluate approaches - band-aid, optimize, reframe, or redesign.
Skill for building LLM evaluation.
Deep research expert for comprehensive technical investigations. Use when conducting technology evaluations, comparing solutions, analyzing papers, or exploring technical trends.
Measure quality effectively with actionable metrics. Use when establishing quality dashboards, defining KPIs, or evaluating test effectiveness.
Conducts thorough landscape research, competitive analysis, best practices evaluation, and evidence-based recommendations. Expert in market research and trend analysis.
Compare multiple alternatives using explicit criteria, weighted scoring, and tradeoff analysis. Use when choosing between options, evaluating alternatives, or making decisions.
Analyze financial data, build models, evaluate investments, and provide data-driven financial recommendations
Use when designing prompts for LLMs, optimizing model performance, building evaluation frameworks, or implementing advanced prompting techniques like chain-of-thought, few-shot learning, or...
Systematic peer review toolkit. Evaluate methodology, statistics, design, reproducibility, ethics, figure integrity, reporting standards, for manuscript and grant review across disciplines.
Systematic peer review toolkit. Evaluate methodology, statistics, design, reproducibility, ethics, figure integrity, reporting standards, for manuscript and grant review across disciplines.
Enforce policies, guardrails, and permission boundaries; refuse unsafe actions and apply least privilege. Use when evaluating actions against policies, checking permissions, or reducing scope to...
Evaluate research rigor. Assess methodology, experimental design, statistical validity, biases, confounding, evidence quality (GRADE, Cochrane ROB), for critical analysis of scientific claims.
Evaluate research rigor. Assess methodology, experimental design, statistical validity, biases, confounding, evidence quality (GRADE, Cochrane ROB), for critical analysis of scientific claims.
Quick classifier training with automatic model selection, hyperparameter tuning, and comprehensive evaluation metrics.