5423 results (49.1ms) page 3 / 272
zechenzhangAGI / ai-research-skills-nemo-evaluator-sdk exact

Evaluates LLMs across 100+ benchmarks from 18+ harnesses (MMLU, HumanEval, GSM8K, safety, VLM) with multi-backend execution. Use when needing scalable evaluation on local Docker, Slurm HPC, or...

liqiongyu / lenny-skills-plus-evaluating-new-technology exact

Create a Technology Evaluation Pack (problem framing, options matrix, build vs buy, pilot plan, risk review, decision memo). Use for evaluating new tech, emerging technology, AI tools, vendor...

404kidwiz / agent-skills-backup-agent-evaluation exact

Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world...

shishiv / gsd-agent-evaluation exact

Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world...

halay08 / fullstack-agent-skills-agent-evaluation exact

Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world...

ngxtm / devkit-agent-evaluation exact

Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world...

sickn33 / antigravity-awesome-skills-agent-evaluation exact

Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world...

Ianfr13 / claude-code-plugins-agent-evaluation exact

Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world...

cleodin / antigravity-awesome-skills-agent-evaluation exact

Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world...

ramidamolis-alt / agent-skills-workflows-agent-evaluation exact

Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world...

automindtechnologie-jpg / ultimate-skill-md-agent-evaluation exact

Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world...

shipshitdev / library-cofounder-evaluator exact

Use this skill when users need to evaluate potential co-founders, assess founder compatibility, design equity splits, or navigate co-founder relationships. Activates for "should I work with this...

daffy0208 / ai-dev-standards-framework-orchestrator exact

Meta-skill that coordinates all frameworks and skills throughout the project lifecycle, providing intelligent sequencing based on project patterns

mindrally / skills-spring-framework exact

Expert guidance for Spring Framework and Spring Boot development with Java best practices, dependency injection, and RESTful API design

cleodin / antigravity-awesome-skills-metasploit-framework exact

This skill should be used when the user asks to "use Metasploit for penetration testing", "exploit vulnerabilities with msfconsole", "create payloads with msfvenom", "perform post-exploitation",...

halay08 / fullstack-agent-skills-metasploit-framework exact

This skill should be used when the user asks to "use Metasploit for penetration testing", "exploit vulnerabilities with msfconsole", "create payloads with msfvenom", "perform post-exploitation",...

zebbern / claude-code-guide-metasploit-framework exact

This skill should be used when the user asks to "use Metasploit for penetration testing", "exploit vulnerabilities with msfconsole", "create payloads with msfvenom", "perform post-exploitation",...

sickn33 / antigravity-awesome-skills-metasploit-framework exact

This skill should be used when the user asks to "use Metasploit for penetration testing", "exploit vulnerabilities with msfconsole", "create payloads with msfvenom", "perform post-exploitation",...

404kidwiz / agent-skills-backup-metasploit-framework exact

This skill should be used when the user asks to "use Metasploit for penetration testing", "exploit vulnerabilities with msfconsole", "create payloads with msfvenom", "perform post-exploitation",...