Search: model-evaluation | AgentSkillsRepo

sanity-io / agent-toolkit-content-modeling-best-practices exact

General principles for structured content modeling that apply across CMSs, with Sanity-specific guidance. Use when designing content schemas, planning content architecture, or evaluating content...

★ 62 ai

ai-rules claude-code claude-code-plugin cursor-ai

threat-modeling 0.00

hardw00t / ai-security-arsenal-threat-modeling exact

Threat modeling skill for identifying security threats, attack surfaces, and designing mitigations. This skill should be used when performing threat assessments using STRIDE, PASTA, or Attack...

★ 16 ai

model-trainer 0.00

eugenepyvovarov / mcpbundler-agent-skills-marketplace-model-trainer exact

This skill should be used when users want to train or fine-tune language models using TRL (Transformer Reinforcement Learning) on Hugging Face Jobs infrastructure. Covers SFT, DPO, GRPO and reward...

★ 5 ai

agent-skill agent-skills claude codex

design-business-model 0.00

ThepExcel / agent-skills-design-business-model exact

Business model design and validation using Business Model Canvas, Lean Canvas, and Value Proposition Canvas. Use when designing new business models, validating startup ideas, achieving...

★ 14 ai

evaluation-rubrics 0.00

lyndonkl / claude-evaluation-rubrics exact

Use when need explicit quality criteria and scoring scales to evaluate work consistently, compare alternatives objectively, set acceptance thresholds, reduce subjective bias, or when user mentions...

★ 15 devops

Model Deployment 0.00

aj-geddes / useful-ai-prompts-model-deployment exact

Deploy machine learning models to production using Flask, FastAPI, Docker, cloud platforms (AWS, GCP, Azure), and model serving frameworks

★ 55 devops

nemo-evaluator-sdk 0.00

zechenzhangAGI / ai-research-skills-nemo-evaluator-sdk exact

Evaluates LLMs across 100+ benchmarks from 18+ harnesses (MMLU, HumanEval, GSM8K, safety, VLM) with multi-backend execution. Use when needing scalable evaluation on local Docker, Slurm HPC, or...

★ 1,712 ai

ai ai-research claude claude-code

evaluating-new-technology 0.00

liqiongyu / lenny-skills-plus-evaluating-new-technology exact

Create a Technology Evaluation Pack (problem framing, options matrix, build vs buy, pilot plan, risk review, decision memo). Use for evaluating new tech, emerging technology, AI tools, vendor...

★ 14 ai

agent-skills ai-agents automation claude

domain-modeling 0.00

front-depiction / claude-setup-domain-modeling exact

Create production-ready Effect domain models using Schema.TaggedStruct for ADTs, Schema.Data for automatic equality, with comprehensive predicates, orders, guards, and match functions. Use when...

★ 11 ai

agent-evaluation 0.00

omer-metin / skills-for-antigravity-agent-evaluation exact

Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world...

★ 5 ai

ai-agents antigravity antigravity-ide skills

climate-modeling 0.00

omer-metin / skills-for-antigravity-climate-modeling exact

Work with climate data, models, and projections for climate impact assessment, downscaling, and scenario analysis using CMIP6 and other climate datasets. Use when "climate model, climate...

★ 5 ai

ai-agents antigravity antigravity-ide skills

dspy-evaluation-suite 0.00

OmidZamani / dspy-skills-dspy-evaluation-suite exact

This skill should be used when the user asks to "evaluate a DSPy program", "test my DSPy module", "measure performance", "create evaluation metrics", "use answer_exact_match or SemanticF1",...

★ 20 ai

agent-skills claude-code claude-skills dspy

evaluating-candidates 0.00

liqiongyu / lenny-skills-plus-evaluating-candidates exact

Make an evidence-based hiring decision and produce a Candidate Evaluation Decision Pack (criteria + scorecard, signal log, work sample/trial plan + rubric, reference check script + summary,...

★ 14 ai

agent-skills ai-agents automation claude

skill-model-updater 0.00

bitwize-music-studio / claude-ai-music-skills-skill-model-updater exact

Update model references in skill files when new Claude models are released

★ 2 ai

claude claude-code claude-code-plugin claude-code-plugin-marketplace

pydantic-ai-model-integration 0.00

existential-birds / beagle-pydantic-ai-model-integration exact

Configure LLM providers, use fallback models, handle streaming, and manage model settings in PydanticAI. Use when selecting models, implementing resilience, or optimizing API calls.

★ 15 ai

ai-agents bubbletea claude-code claude-code-plugin

ai-model-web 0.00

TencentCloudBase / skills-ai-model-web exact

Use this skill when developing browser/Web applications (React/Vue/Angular, static websites, SPAs) that need AI capabilities. Features text generation (generateText) and streaming (streamText) via...

★ 6 ai

evaluating-new-technology 0.00

RefoundAI / lenny-skills-evaluating-new-technology exact

Help users evaluate emerging technologies. Use when someone is assessing new tools, making build vs buy decisions, evaluating AI vendors, or deciding on technical architecture.

★ 1 ai

ai-agents ai-assistant claude claude-code

ai-model-nodejs 0.00

TencentCloudBase / skills-ai-model-nodejs exact

Use this skill when developing Node.js backend services or CloudBase cloud functions (Express/Koa/NestJS, serverless, backend APIs) that need AI capabilities. Features text generation...

★ 6 web

ai-model-wechat 0.00

TencentCloudBase / skills-ai-model-wechat exact

Use this skill when developing WeChat Mini Programs (小程序, 企业微信小程序, wx.cloud-based apps) that need AI capabilities. Features text generation (generateText) and streaming (streamText) with callback...

★ 6 web

evaluating-trade-offs 0.00

liqiongyu / lenny-skills-plus-evaluating-trade-offs exact

Evaluate trade-offs and produce a Trade-off Evaluation Pack (trade-off brief, options+criteria matrix, all-in cost/opportunity cost table, impact ranges, recommendation, stop/continue triggers)....

★ 14 ai

agent-skills ai-agents automation claude

Confirm

Submit a Skill