Search: evaluation-framework | AgentSkillsRepo

ready ~/ agentskillsrepo

login

5423 results (49.1ms) page 3 / 272

nemo-evaluator-sdk 0.00

zechenzhangAGI / ai-research-skills-nemo-evaluator-sdk exact

Evaluates LLMs across 100+ benchmarks from 18+ harnesses (MMLU, HumanEval, GSM8K, safety, VLM) with multi-backend execution. Use when needing scalable evaluation on local Docker, Slurm HPC, or...

★ 1,712 ai

ai ai-research claude claude-code

evaluating-new-technology 0.00

liqiongyu / lenny-skills-plus-evaluating-new-technology exact

Create a Technology Evaluation Pack (problem framing, options matrix, build vs buy, pilot plan, risk review, decision memo). Use for evaluating new tech, emerging technology, AI tools, vendor...

★ 14 ai

agent-skills ai-agents automation claude

agent-evaluation 0.00

404kidwiz / agent-skills-backup-agent-evaluation exact

Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world...

★ 0 ai

agent-evaluation 0.00

shishiv / gsd-agent-evaluation exact

Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world...

★ 0 ai

agent-evaluation 0.00

halay08 / fullstack-agent-skills-agent-evaluation exact

Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world...

★ 0 ai

agent-evaluation 0.00

ngxtm / devkit-agent-evaluation exact

Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world...

★ 0 ai

agent ai automation claude

agent-evaluation 0.00

sickn33 / antigravity-awesome-skills-agent-evaluation exact

Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world...

★ 2,844 ai

agentic-skills ai-agents antigravity autonomous-coding

agent-evaluation 0.00

Ianfr13 / claude-code-plugins-agent-evaluation exact

Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world...

★ 0 ai

agent-evaluation 0.00

cleodin / antigravity-awesome-skills-agent-evaluation exact

Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world...

★ 1 ai

agentic-skills ai-agents antigravity antigravity-ide

agent-evaluation 0.00

ramidamolis-alt / agent-skills-workflows-agent-evaluation exact

Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world...

★ 0 ai

agent-evaluation 0.00

automindtechnologie-jpg / ultimate-skill-md-agent-evaluation exact

Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world...

★ 0 ai

tech-stack-evaluator 0.00

matteocervelli / llms-tech-stack-evaluator exact

Auto-activates during requirements analysis to evaluate technical stack

★ 16 tools

cofounder-evaluator 0.00

shipshitdev / library-cofounder-evaluator exact

Use this skill when users need to evaluate potential co-founders, assess founder compatibility, design equity splits, or navigate co-founder relationships. Activates for "should I work with this...

★ 4 ai

claude-code codex commands skills

Framework Orchestrator 0.00

daffy0208 / ai-dev-standards-framework-orchestrator exact

Meta-skill that coordinates all frameworks and skills throughout the project lifecycle, providing intelligent sequencing based on project patterns

★ 7 devops

spring-framework 0.00

mindrally / skills-spring-framework exact

Expert guidance for Spring Framework and Spring Boot development with Java best practices, dependency injection, and RESTful API design

★ 3 web

Metasploit Framework 0.00

cleodin / antigravity-awesome-skills-metasploit-framework exact

This skill should be used when the user asks to "use Metasploit for penetration testing", "exploit vulnerabilities with msfconsole", "create payloads with msfvenom", "perform post-exploitation",...

★ 1 development

agentic-skills ai-agents antigravity antigravity-ide

Metasploit Framework 0.00

halay08 / fullstack-agent-skills-metasploit-framework exact

This skill should be used when the user asks to "use Metasploit for penetration testing", "exploit vulnerabilities with msfconsole", "create payloads with msfvenom", "perform post-exploitation",...

★ 0 development

Metasploit Framework 0.00

zebbern / claude-code-guide-metasploit-framework exact

This skill should be used when the user asks to "use Metasploit for penetration testing", "exploit vulnerabilities with msfconsole", "create payloads with msfvenom", "perform post-exploitation",...

★ 3,228 ai

ai ai-agent ai-agent-tools claude

Metasploit Framework 0.00

sickn33 / antigravity-awesome-skills-metasploit-framework exact

This skill should be used when the user asks to "use Metasploit for penetration testing", "exploit vulnerabilities with msfconsole", "create payloads with msfvenom", "perform post-exploitation",...

★ 2,844 development

agentic-skills ai-agents antigravity autonomous-coding

Metasploit Framework 0.00

404kidwiz / agent-skills-backup-metasploit-framework exact

This skill should be used when the user asks to "use Metasploit for penetration testing", "exploit vulnerabilities with msfconsole", "create payloads with msfvenom", "perform post-exploitation",...

★ 0 development