skill-optimizer

by @btxbtwn in AI & LLM

# Install this skill:

npx skills add btxbtwn/list-this-skill-suite --skill "skill-optimizer"

Install specific skill from multi-skill repository

# Description

Improve another skill by running repeated test prompts, scoring outputs against binary eval criteria, and proposing tighter SKILL.md instructions. Use when the user wants to auto-improve a skill, build an eval suite for a skill, adapt Karpathy-style autoresearch to prompt/skill optimization, or run iterative optimization on a skill such as workout-log.

# SKILL.md

name: skill-optimizer
description: Improve another skill by running repeated test prompts, scoring outputs against binary eval criteria, and proposing tighter SKILL.md instructions. Use when the user wants to auto-improve a skill, build an eval suite for a skill, adapt Karpathy-style autoresearch to prompt/skill optimization, or run iterative optimization on a skill such as workout-log.

Skill Optimizer

Use this skill to optimize another skill's instructions with an eval-driven loop.

Read references/autoresearch-pattern.md for the minimal pattern lifted from Karpathy's autoresearch repo.
Read references/eval-design.md when designing binary evals.
Read references/workout-log-evals.md when the target skill is workout-log.
Use scripts/start_workout_log_eval.sh to initialize a repeatable workout-log eval run and store artifacts under assets/workout-log-eval-run/.
Use scripts/score_template.md as the scoring sheet for each pass.
Use scripts/append_result.py to append summary scores to the TSV after a pass.

Core loop

Pick one target skill only.
Read the target SKILL.md and any required references.
Build a small test set of realistic user prompts.
Define binary eval questions for each prompt.
Run the target skill against the tests.
Score the outputs.
Tighten the target skill instructions.
Re-run the same tests.
Keep the improved version only if the score is better or meaningfully simpler at equal score.

Rules

Keep evals binary whenever possible.
Prefer 4-8 eval checks per test prompt.
Test with realistic prompts, not idealized prompts.
Do not optimize multiple skills in one pass.
Preserve the target skill's purpose; improve reliability, not scope creep.
If a result is ambiguous, mark the eval as failed instead of inventing a pass.
Favor simpler prompt edits over bloated prompt edits.

Output shape

When reporting an optimization pass, include:
- target skill
- test prompts used
- eval criteria
- baseline score
- revised score
- what changed
- whether to keep the revision

Recommended first target

Start with workout-log before trying browser-heavy skills.
It is easier to evaluate, less noisy, and failures are more obvious.

# Supported AI Coding Agents

This skill is compatible with the SKILL.md standard and works with all major AI coding agents:

⚡ Amp 🚀 Antigravity 🤖 Claude Code 🦀 Clawdbot 📝 Codex ▶️ Cursor 🤖 Droid 💎 Gemini CLI 🐙 GitHub Copilot 🪿 Goose 📊 Kilo Code 🔧 Kiro CLI 💻 OpenCode 🦘 Roo Code 🌲 Trae 🏄 Windsurf

Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.