Search: llm-eval | AgentSkillsRepo

which-llm 0.30

richard-gyiko / which-llm-which-llm exact

Select optimal LLM(s) for a task based on skill requirements, budget, and constraints. Uses the `which-llm` CLI to query Artificial Analysis benchmarks enriched with capability data from models.dev.

★ 0 ai

agent-skill ai benchmarks cli

llm-evals-toolkit 0.30

Y4rd13 / fullstack-ml-ai-agent-skills-llm-evals-toolkit exact

Skill for building LLM evaluation.

★ 0 ai

ai-evals 0.29

RefoundAI / lenny-skills-ai-evals exact

Help users create and run AI evaluations. Use when someone is building evals for LLM products, measuring model quality, creating test cases, designing rubrics, or trying to systematically measure...

★ 1 ai

ai-agents ai-assistant claude claude-code

phoenix-evals 0.27

Arize-ai / phoenix-phoenix-evals exact

Build and run evaluators for AI/LLM applications using Phoenix.

★ 8,402 ai

llmops ai-monitoring ai-observability llm-eval

ai-evals 0.26

liqiongyu / lenny-skills-plus-ai-evals exact

Create an AI Evals Pack (eval PRD, test set, rubric, judge plan, results + iteration loop). Use for LLM evaluation, benchmarks, rubrics, error analysis/open coding, and ship/no-ship quality gates...

★ 14 ai

agent-skills ai-agents automation claude

evals 0.26

adriancooney / evals-evals exact

Run and create evals for testing agent behavior. Use when the user wants to create or run an eval.

★ 0 ai

eval 0.25

mikeyobrien / ralph-orchestrator-eval exact

EvalKit is a conversational evaluation framework for AI agents that guides you through creating robust evaluations using the Strands Evals SDK. Through natural conversation, you can plan...

★ 1,473 ai

ai ai-agents ai-agents-framework ai-developer-tools

llm-judge 0.24

existential-birds / beagle-llm-judge exact

LLM-as-judge methodology for comparing code implementations across repositories. Scores implementations on functionality, security, test quality, overengineering, and dead code using weighted...

★ 15 ai

ai-agents bubbletea claude-code claude-code-plugin

llm-security 0.23

semgrep / skills-llm-security exact

Security guidelines for LLM applications based on OWASP Top 10 for LLM 2025. Use when building LLM apps, reviewing AI security, implementing RAG systems, or asking about LLM vulnerabilities like...

★ 7 ai

agents claude-code security skills

llm-artifacts-detection 0.23

existential-birds / beagle-llm-artifacts-detection exact

Detects common LLM coding agent artifacts in codebases. Identifies test quality issues, dead code, over-abstraction, and verbose LLM style patterns. Use when cleaning up AI-generated code or...

★ 15 ai

ai-agents bubbletea claude-code claude-code-plugin

unity-llm-integration 0.22

omer-metin / skills-for-antigravity-unity-llm-integration exact

Integrating local and cloud LLMs into Unity games for AI NPCs, dialogue, and intelligent behaviorsUse when "unity llm, llmunity, unity ai npc, unity local llm, unity sentis llm, unity chatgpt,...

★ 5 ai

ai-agents antigravity antigravity-ide skills

godot-llm-integration 0.22

omer-metin / skills-for-antigravity-godot-llm-integration exact

Integrating local LLMs into Godot games using NobodyWho and other Godot-native solutionsUse when "godot llm, nobodywho, godot ai npc, gdscript llm, godot local llm, godot chatgpt, godot 4 ai,...

★ 5 ai

ai-agents antigravity antigravity-ide skills

llm-architect 0.22

omer-metin / skills-for-antigravity-llm-architect exact

LLM application architecture expert for RAG, prompting, agents, and production AI systemsUse when "rag system, prompt engineering, llm application, ai agent, structured output, chain of thought,...

★ 5 ai

ai-agents antigravity antigravity-ide skills

llm-security 0.21

hardw00t / ai-security-arsenal-llm-security exact

LLM and AI application security testing skill for prompt injection, jailbreaking, and AI system vulnerabilities. This skill should be used when testing AI/ML applications for security issues,...

★ 16 ai

llm-evaluation 0.21

halay08 / fullstack-agent-skills-llm-evaluation exact

Implement comprehensive evaluation strategies for LLM applications using automated metrics, human feedback, and benchmarking. Use when testing LLM performance, measuring AI application quality, or...

★ 0 ai

llm-evaluation 0.21

rmyndharis / antigravity-skills-llm-evaluation exact

Implement comprehensive evaluation strategies for LLM applications using automated metrics, human feedback, and benchmarking. Use when testing LLM performance, measuring AI application quality, or...

★ 187 ai

llm-evaluation 0.21

404kidwiz / agent-skills-backup-llm-evaluation exact

Implement comprehensive evaluation strategies for LLM applications using automated metrics, human feedback, and benchmarking. Use when testing LLM performance, measuring AI application quality, or...

★ 0 ai

llm-evaluation 0.21

ovachiever / droid-tings-llm-evaluation exact

Implement comprehensive evaluation strategies for LLM applications using automated metrics, human feedback, and benchmarking. Use when testing LLM performance, measuring AI application quality, or...

★ 19 ai

llm-communication 0.20

phrazzld / claude-config-llm-communication exact

Write effective LLM prompts, commands, and agent instructions. Goal-oriented over step-prescriptive. Role + Objective + Latitude pattern. Use when writing prompts, designing agents, building...

★ 2 ai

llm-router 0.20

jamesrochabrun / skills-llm-router exact

This skill should be used when users want to route LLM requests to different AI providers (OpenAI, Grok/xAI, Groq, DeepSeek, OpenRouter) using SwiftOpenAI-CLI. Use this skill when users ask to...

★ 33 ai

Confirm

Submit a Skill