skills-master

by @TeamZaobi in AI & LLM

# Install this skill:

npx skills add TeamZaobi/skills-master

Or install specific skill: npx add-skill https://github.com/TeamZaobi/skills-master

# Description

Create, refactor, evaluate, and maintain skills or whole skill toolkits. Use whenever the user wants to design a new skill, rewrite an overfit skill, clean up outdated skill docs, set up evals, compare versions, improve trigger descriptions, or rationalize how skills are organized across tools.

# SKILL.md

name: skills-master
description: Create, refactor, evaluate, and maintain skills or whole skill toolkits. Use whenever the user wants to design a new skill, rewrite an overfit skill, clean up outdated skill docs, set up evals, compare versions, improve trigger descriptions, or rationalize how skills are organized across tools.

Skills Master

Use this skill to move a skill project forward end to end. Do not treat it as a prompt-writing exercise only. Inspect the current repository, identify the user's stage, and choose the lightest workflow that will produce a defensible result.

Working Style

Start from the actual repository state, not from inherited wording or remembered conventions.
Use accessible language unless the user clearly wants technical shorthand.
Prefer rewriting an outdated section cleanly over stacking more caveats onto it.
Separate portable guidance from platform-specific mechanics.
When the user only wants a focused cleanup, do that directly instead of forcing the full evaluation loop.

Decide The Job

Classify the request into one primary mode before editing:

Create: there is no usable skill yet.
Refactor: the skill exists but the structure, guidance, or scope is weak.
Document cleanup: the repository drifted and the docs no longer match the real workflow.
Evaluate: the user wants test prompts, benchmarks, or side-by-side comparison.
Optimize triggering: the user wants better frontmatter descriptions and measurable trigger behavior.
Package or distribute: the skill is done and needs linking or packaging.

If several modes apply, handle them in this order:

Fix repository truth
Fix skill content
Add or repair evaluation
Optimize triggering
Package or link

Capture Intent

When creating or reshaping a skill, determine:

What job the skill should make easier
When it should trigger
What success looks like
What output format or artifacts matter
Whether the task benefits from formal evaluation or only qualitative review

Pull answers from the conversation and repository first. Ask follow-up questions only when the missing detail changes the implementation meaningfully.

Authoring Rules

Frontmatter

Every skill must have:

name
description

Optional fields such as compatibility are fine when they clarify real runtime requirements.

Write descriptions for triggering, not for marketing. The description should describe user intent, not only implementation details.

Structure

Use the simplest structure that matches the task:

Workflow-based for sequential jobs
Task-based for collections of operations
Reference-based for standards, policies, or long-lived guidance
Capability-based for integrated systems with several related powers

Mix patterns only when it clearly improves usability.

Progressive Disclosure

Use the skill folder deliberately:

SKILL.md for the main operating instructions
scripts/ for deterministic or repetitive execution
references/ for supporting material that should be loaded only when needed
assets/ for templates, boilerplates, and output-side files

If a reference file becomes large, add navigation hints or a small table of contents.

Writing Guidance

Explain why an instruction matters.
Avoid brittle rule piles unless the task truly has hard constraints.
Prefer generalizable guidance over examples that only fit one test case.
If the docs drifted because of multiple iterations, rewrite the affected section as a whole instead of appending more exceptions.

Evaluation Workflow

Use the full loop only when it adds signal. For a small doc correction or a narrow rewrite, a lighter pass is usually better.

When To Use Formal Evals

Formal evals are especially useful for:

file transforms
extraction workflows
code generation with objective checks
multi-step procedures with stable success criteria

Qualitative review is usually enough for:

writing tone
branding voice
interface taste
other subjective outputs

Create The Eval Set

Store task prompts in evals/evals.json.

Use realistic prompts that a real user would actually send. Include relevant files when needed. Add assertions only when they are objectively checkable.

See references/schemas.md for the expected JSON structure.

Run The Workspace Loop

Put results in a sibling workspace named <skill-name>-workspace/.

Organize by iteration:

<skill-name>-workspace/
└── iteration-1/
    └── eval-0/

When the environment supports independent task execution, run the skill version and a baseline in the same iteration:

New skill: compare with_skill against without_skill
Existing skill rewrite: compare the new draft against a snapshot of the old skill

Create eval_metadata.json per eval directory. Capture timing data as soon as the environment exposes it.

Grade And Aggregate

After runs finish:

Grade each run using agents/grader.md or an equivalent inline grading pass.
Aggregate the iteration with:

python -m scripts.aggregate_benchmark <workspace>/iteration-N --skill-name <name>

Read the benchmark for patterns, not just totals. Use agents/analyzer.md when you need a structured analyst pass.

Human Review

Use the bundled viewer instead of inventing one-off review pages:

python eval-viewer/generate_review.py <workspace>/iteration-N --skill-name "my-skill"

If the environment is headless, use the static HTML mode supported by the viewer and collect feedback.json from the user afterward.

Improve Without Overfitting

When revising after feedback:

generalize from the complaint
remove instructions that are not earning their keep
look for repeated helper work that should become a bundled script
avoid turning one stubborn edge case into a giant wall of rigid rules

Repeat the loop only while it produces meaningful improvement.

When the user wants a more rigorous A/B comparison between two skill versions, use the comparison flow:

agents/comparator.md
agents/analyzer.md

This is optional. Do not force it into ordinary skill cleanup work.

Trigger Description Optimization

This is a separate step. Only do it after the skill itself is already in reasonable shape.

Important Boundary

The current trigger optimization scripts are Anthropic / Claude specific, not tool-agnostic. They rely on:

anthropic
claude CLI
.claude/commands-based discovery behavior

If that stack is unavailable, skip this section rather than pretending it is portable.

Prepare The Eval Queries

Create about 20 realistic queries:

some should trigger
some should not trigger
the negative cases should be near misses, not obviously unrelated prompts

Use concrete, natural language that resembles real user requests.

Review The Query Set

Use assets/eval_review.html when you want the user to edit or approve the trigger eval set before running the optimizer.

Run The Loop

Prefer module execution for package-aware scripts:

python -m scripts.run_loop \
  --eval-set <path-to-trigger-evals.json> \
  --skill-path <path-to-skill> \
  --model <active-model-id> \
  --max-iterations 5 \
  --verbose

This loop evaluates the current description, proposes revisions, and keeps history. It chooses the best result by held-out performance when a holdout split is enabled.

Apply The Result

Take best_description, update the skill frontmatter, and show the before/after difference to the user with the measured scores.

Linking And Packaging

Use one editable source of truth:

user scope: ~/.agents/skills/<skill-name>
project scope: <project-root>/.agents/skills/<skill-name>

Link outward to tool-specific discovery folders instead of maintaining multiple editable copies.

Initialize

python scripts/init_skill.py my-skill --path ~/.agents/skills

Link

python scripts/link_skill.py <skill-path>
python scripts/link_skill.py <skill-path> --status

Validate

python -m scripts.quick_validate <skill-path>

Package

python -m scripts.package_skill <skill-path>

Package only after the skill content and metadata are stable enough to share.

Environment Adaptation

Do not assume every environment has the same features.

If there are no subagents or parallel workers, run evals serially and lean more on human review.
If there is no browser, export static HTML or present results inline.
If there is no Anthropic stack, skip trigger optimization.
If the user only asked for documentation cleanup, do not drag them through benchmarking machinery they did not ask for.

Reference Files

Read these only when they are relevant to the current task:

agents/grader.md
agents/comparator.md
agents/analyzer.md
references/schemas.md
references/workflows.md
references/output-patterns.md

Final Check

Before you finish, confirm that:

The repository's main docs agree with the real workflow.
Platform-specific instructions are clearly labeled.
The skill did not become longer just to preserve outdated history.
The user can tell what is stable, what is optional, and what is environment-bound.

# README.md

Skills Master

skills-master 不是面向某个业务领域的 skill，也不是一个泛化的 agent 工具箱。

它的真正开发目的很简单：把“开发 skill 这件事”本身做成一个可复用的 skill。也就是说，当 agent 需要创建、重写、收缩、评估或治理其他 skills 时，这个 skill 提供方法、结构和配套资源。

这个仓库究竟是什么

这个仓库是 skills-master 这个 meta-skill 的源码与配套资源。

它服务的不是终端业务任务，而是下面这些“skill 开发任务”：

从零创建一个新 skill
重写已经过拟合、堆规则、越改越乱的 skill
清理失真的说明文档，让 README、SKILL.md、脚本能力重新一致
给 skill 建立评测、人工 review、对比基线和迭代闭环
优化 skill 的触发描述与结构组织
统一多个工具环境下的真实源、链接与分发方式

如果一句话概括：它是“用来开发和维护其他 skills 的 skill”。

它不是什么

为了避免继续写偏，这个仓库不应该被描述成：

一个通用的多 Agent 框架
一个以脚本集合为核心的 automation 仓库
一个主要目标是打包 .skill 文件的发布工具
一个围绕某个平台历史遗留语义展开的兼容层

脚本、viewer、reference 文档都只是配套资源。主语始终应该是这个 skill 本身，以及它如何帮助 agent 更好地开发别的 skills。

核心方法

skills-master 的核心方法不是“多写规则”，而是：

先识别当前任务属于创建、重写、文档校正、评测、触发优化还是分发
先修正仓库真实情况，再修 skill 内容，再补评测与分发
避免补丁式叠说明，尽量整段重写失真的 section
只在真的重复、脆弱、需要确定性时才把能力下沉到 scripts/
把通用方法和平台专用机制分开写清楚
用尽可能少但足够的结构，让 skill 可长期维护，而不是只对几个样例有效

仓库里的东西各自负责什么

`SKILL.md`

这是核心。它定义了 agent 在处理“skill 开发任务”时应该如何思考、分流、重写、评测和迭代。

`scripts/`

这些脚本不是仓库的目的，而是为 meta-skill 提供支撑：

初始化 skill
校验 skill 结构
管理链接
打包 skill
聚合 benchmark
在特定平台里做触发描述评测

`references/`

提供可按需读取的技能设计参考，例如工作流模式、输出模式、JSON 结构约定。

`agents/`

提供评分、对比、分析这类辅助角色说明，用于评测和迭代阶段。

`eval-viewer/`

提供人工 review 界面生成能力，帮助人类快速查看不同迭代的输出与 benchmark。

这个 skill 的主路径

如果按真正用途来理解，本项目的默认使用路径应该是：

识别一个已有或待创建的 skill 是否需要被重做
阅读或改写它的 SKILL.md
决定哪些内容该留在主文档，哪些该下沉到 scripts/、references/、assets/
必要时补评测集和人工 review 流程
在确认结构稳定后，再考虑触发优化、链接和打包

也就是说，先有 skill 设计与治理，后有脚本与发布动作。不要把顺序倒过来。

给人的入口

如果你是人类维护者，先看这几个文件：

其中：

README.md 解释这个仓库为什么存在
SKILL.md 才是 agent 真正执行时要遵循的说明

配套脚本

只有在需要时再使用这些脚本，不要把它们当成项目目的：

python3 scripts/init_skill.py my-skill --path ~/.agents/skills
python3 -m scripts.quick_validate /path/to/my-skill
python3 scripts/link_skill.py /path/to/my-skill --status
python3 -m scripts.package_skill /path/to/my-skill
python3 -m scripts.aggregate_benchmark /path/to/workspace/iteration-1 --skill-name my-skill

依赖边界

通用能力

下面这些能力是本仓库的常规部分：

skill 结构设计
文档重写
初始化、校验、链接、打包、benchmark 聚合

最小依赖：

Python 3.9+
PyYAML

python3 -m pip install pyyaml

平台专用增强能力

触发描述优化相关脚本目前仍然依赖 Anthropic / Claude 生态：

anthropic
claude CLI
对应认证环境

python3 -m pip install anthropic
python3 -m scripts.run_eval --help
python3 -m scripts.run_loop --help

如果这部分环境不存在，不影响本仓库作为 meta-skill 的主要用途。

当前状态

这个仓库现在已经完成一轮从“继承型文档”到“按真实目的重述”的整理，但仍有一些明确边界：

还没有统一的 requirements.txt 或 pyproject.toml
部分脚本更适合以 python3 -m scripts.<name> 方式运行
触发优化链路仍然不是跨平台实现

这些都是配套层面的限制，不影响本项目作为“skill 开发 skill”的主定位。

版本

当前版本说明见 VERSION.md。

# Supported AI Coding Agents

This skill is compatible with the SKILL.md standard and works with all major AI coding agents:

⚡ Amp 🚀 Antigravity 🤖 Claude Code 🦀 Clawdbot 📝 Codex ▶️ Cursor 🤖 Droid 💎 Gemini CLI 🐙 GitHub Copilot 🪿 Goose 📊 Kilo Code 🔧 Kiro CLI 💻 OpenCode 🦘 Roo Code 🌲 Trae 🏄 Windsurf

Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.

skills-master

# Description

# SKILL.md

Skills Master

Working Style

Decide The Job

Capture Intent

Authoring Rules

Frontmatter

Structure

Progressive Disclosure

Writing Guidance

Evaluation Workflow

When To Use Formal Evals

Create The Eval Set

Run The Workspace Loop

Grade And Aggregate

Human Review

Improve Without Overfitting

Blind Comparison

Trigger Description Optimization

Important Boundary

Prepare The Eval Queries

Review The Query Set

Run The Loop

Apply The Result

Linking And Packaging

Initialize

Link

Validate

Package

Environment Adaptation

Reference Files

Final Check

# README.md

Skills Master

这个仓库究竟是什么

它不是什么

核心方法

仓库里的东西各自负责什么

SKILL.md

scripts/

references/

agents/

eval-viewer/

这个 skill 的主路径

给人的入口

配套脚本

依赖边界

通用能力

平台专用增强能力

当前状态

版本

# Related Skills

# Supported AI Coding Agents

Confirm

Submit a Skill

`SKILL.md`

`scripts/`

`references/`

`agents/`

`eval-viewer/`