A formal evaluation framework for Claude Code sessions, implementing eval-driven development (EDD) principles.
Formal evaluation framework for Claude Code sessions implementing eval-driven development (EDD) principles
Formal evaluation framework for Claude Code sessions implementing eval-driven development (EDD) principles
|
Evaluates LLMs across 60+ academic benchmarks (MMLU, HumanEval, GSM8K, TruthfulQA, HellaSwag). Use when benchmarking model quality, comparing models, reporting academic results, or tracking...
Evaluates LLMs across 60+ academic benchmarks (MMLU, HumanEval, GSM8K, TruthfulQA, HellaSwag). Use when benchmarking model quality, comparing models, reporting academic results, or tracking...
AI agents and Nix: parametrable skills/instructions and tools, packaged together in a reproducible and modular fashion
Install Agent Skills to your AI coding agent. Supports Claude Code, Goose, OpenCode, Cursor, and other harnesses.
Evaluates LLMs across 100+ benchmarks from 18+ harnesses (MMLU, HumanEval, GSM8K, safety, VLM) with multi-backend execution. Use when needing scalable evaluation on local Docker, Slurm HPC, or...
Control the Grabbit CLI to record browser interactions (HAR) and generate API workflows. Use this skill when the user wants to: (1) Automate browser actions, (2) Capture web traffic for API...
Write unit and integration tests for Angular v21+ applications using Vitest or Jasmine with TestBed, component harnesses, and modern testing patterns. Use for testing components with signals,...
Analyze Claude Code sessions via Braintrust
Analyze Claude Code sessions via Braintrust
Unit-aware computation with Pint - convert units, dimensional analysis, unit arithmetic
Unit-aware computation with Pint - convert units, dimensional analysis, unit arithmetic
Code quality checks, formatting, and metrics via qlty CLI
Code quality checks, formatting, and metrics via qlty CLI
Search library documentation and code examples via Nia
Search library documentation and code examples via Nia
AST-based code search and refactoring via ast-grep MCP