TeamZaobi

skills-master

0
0
# Install this skill:
npx skills add TeamZaobi/skills-master

Or install specific skill: npx add-skill https://github.com/TeamZaobi/skills-master

# Description

Create, refactor, evaluate, and maintain skills or whole skill toolkits. Use whenever the user wants to design a new skill, rewrite an overfit skill, clean up outdated skill docs, set up evals, compare versions, improve trigger descriptions, or rationalize how skills are organized across tools.

# SKILL.md


name: skills-master
description: Create, refactor, evaluate, and maintain skills or whole skill toolkits. Use whenever the user wants to design a new skill, rewrite an overfit skill, clean up outdated skill docs, set up evals, compare versions, improve trigger descriptions, or rationalize how skills are organized across tools.


Skills Master

Use this skill to move a skill project forward end to end. Do not treat it as a prompt-writing exercise only. Inspect the current repository, identify the user's stage, and choose the lightest workflow that will produce a defensible result.

Working Style

  • Start from the actual repository state, not from inherited wording or remembered conventions.
  • Use accessible language unless the user clearly wants technical shorthand.
  • Prefer rewriting an outdated section cleanly over stacking more caveats onto it.
  • Separate portable guidance from platform-specific mechanics.
  • When the user only wants a focused cleanup, do that directly instead of forcing the full evaluation loop.

Decide The Job

Classify the request into one primary mode before editing:

  1. Create: there is no usable skill yet.
  2. Refactor: the skill exists but the structure, guidance, or scope is weak.
  3. Document cleanup: the repository drifted and the docs no longer match the real workflow.
  4. Evaluate: the user wants test prompts, benchmarks, or side-by-side comparison.
  5. Optimize triggering: the user wants better frontmatter descriptions and measurable trigger behavior.
  6. Package or distribute: the skill is done and needs linking or packaging.

If several modes apply, handle them in this order:

  1. Fix repository truth
  2. Fix skill content
  3. Add or repair evaluation
  4. Optimize triggering
  5. Package or link

Capture Intent

When creating or reshaping a skill, determine:

  1. What job the skill should make easier
  2. When it should trigger
  3. What success looks like
  4. What output format or artifacts matter
  5. Whether the task benefits from formal evaluation or only qualitative review

Pull answers from the conversation and repository first. Ask follow-up questions only when the missing detail changes the implementation meaningfully.

Authoring Rules

Frontmatter

Every skill must have:

  • name
  • description

Optional fields such as compatibility are fine when they clarify real runtime requirements.

Write descriptions for triggering, not for marketing. The description should describe user intent, not only implementation details.

Structure

Use the simplest structure that matches the task:

  • Workflow-based for sequential jobs
  • Task-based for collections of operations
  • Reference-based for standards, policies, or long-lived guidance
  • Capability-based for integrated systems with several related powers

Mix patterns only when it clearly improves usability.

Progressive Disclosure

Use the skill folder deliberately:

  • SKILL.md for the main operating instructions
  • scripts/ for deterministic or repetitive execution
  • references/ for supporting material that should be loaded only when needed
  • assets/ for templates, boilerplates, and output-side files

If a reference file becomes large, add navigation hints or a small table of contents.

Writing Guidance

  • Explain why an instruction matters.
  • Avoid brittle rule piles unless the task truly has hard constraints.
  • Prefer generalizable guidance over examples that only fit one test case.
  • If the docs drifted because of multiple iterations, rewrite the affected section as a whole instead of appending more exceptions.

Evaluation Workflow

Use the full loop only when it adds signal. For a small doc correction or a narrow rewrite, a lighter pass is usually better.

When To Use Formal Evals

Formal evals are especially useful for:

  • file transforms
  • extraction workflows
  • code generation with objective checks
  • multi-step procedures with stable success criteria

Qualitative review is usually enough for:

  • writing tone
  • branding voice
  • interface taste
  • other subjective outputs

Create The Eval Set

Store task prompts in evals/evals.json.

Use realistic prompts that a real user would actually send. Include relevant files when needed. Add assertions only when they are objectively checkable.

See references/schemas.md for the expected JSON structure.

Run The Workspace Loop

Put results in a sibling workspace named <skill-name>-workspace/.

Organize by iteration:

<skill-name>-workspace/
└── iteration-1/
    └── eval-0/

When the environment supports independent task execution, run the skill version and a baseline in the same iteration:

  • New skill: compare with_skill against without_skill
  • Existing skill rewrite: compare the new draft against a snapshot of the old skill

Create eval_metadata.json per eval directory. Capture timing data as soon as the environment exposes it.

Grade And Aggregate

After runs finish:

  1. Grade each run using agents/grader.md or an equivalent inline grading pass.
  2. Aggregate the iteration with:
python -m scripts.aggregate_benchmark <workspace>/iteration-N --skill-name <name>
  1. Read the benchmark for patterns, not just totals. Use agents/analyzer.md when you need a structured analyst pass.

Human Review

Use the bundled viewer instead of inventing one-off review pages:

python eval-viewer/generate_review.py <workspace>/iteration-N --skill-name "my-skill"

If the environment is headless, use the static HTML mode supported by the viewer and collect feedback.json from the user afterward.

Improve Without Overfitting

When revising after feedback:

  • generalize from the complaint
  • remove instructions that are not earning their keep
  • look for repeated helper work that should become a bundled script
  • avoid turning one stubborn edge case into a giant wall of rigid rules

Repeat the loop only while it produces meaningful improvement.

Blind Comparison

When the user wants a more rigorous A/B comparison between two skill versions, use the comparison flow:

  • agents/comparator.md
  • agents/analyzer.md

This is optional. Do not force it into ordinary skill cleanup work.

Trigger Description Optimization

This is a separate step. Only do it after the skill itself is already in reasonable shape.

Important Boundary

The current trigger optimization scripts are Anthropic / Claude specific, not tool-agnostic. They rely on:

  • anthropic
  • claude CLI
  • .claude/commands-based discovery behavior

If that stack is unavailable, skip this section rather than pretending it is portable.

Prepare The Eval Queries

Create about 20 realistic queries:

  • some should trigger
  • some should not trigger
  • the negative cases should be near misses, not obviously unrelated prompts

Use concrete, natural language that resembles real user requests.

Review The Query Set

Use assets/eval_review.html when you want the user to edit or approve the trigger eval set before running the optimizer.

Run The Loop

Prefer module execution for package-aware scripts:

python -m scripts.run_loop \
  --eval-set <path-to-trigger-evals.json> \
  --skill-path <path-to-skill> \
  --model <active-model-id> \
  --max-iterations 5 \
  --verbose

This loop evaluates the current description, proposes revisions, and keeps history. It chooses the best result by held-out performance when a holdout split is enabled.

Apply The Result

Take best_description, update the skill frontmatter, and show the before/after difference to the user with the measured scores.

Linking And Packaging

Use one editable source of truth:

  • user scope: ~/.agents/skills/<skill-name>
  • project scope: <project-root>/.agents/skills/<skill-name>

Link outward to tool-specific discovery folders instead of maintaining multiple editable copies.

Initialize

python scripts/init_skill.py my-skill --path ~/.agents/skills
python scripts/link_skill.py <skill-path>
python scripts/link_skill.py <skill-path> --status

Validate

python -m scripts.quick_validate <skill-path>

Package

python -m scripts.package_skill <skill-path>

Package only after the skill content and metadata are stable enough to share.

Environment Adaptation

Do not assume every environment has the same features.

  • If there are no subagents or parallel workers, run evals serially and lean more on human review.
  • If there is no browser, export static HTML or present results inline.
  • If there is no Anthropic stack, skip trigger optimization.
  • If the user only asked for documentation cleanup, do not drag them through benchmarking machinery they did not ask for.

Reference Files

Read these only when they are relevant to the current task:

  • agents/grader.md
  • agents/comparator.md
  • agents/analyzer.md
  • references/schemas.md
  • references/workflows.md
  • references/output-patterns.md

Final Check

Before you finish, confirm that:

  1. The repository's main docs agree with the real workflow.
  2. Platform-specific instructions are clearly labeled.
  3. The skill did not become longer just to preserve outdated history.
  4. The user can tell what is stable, what is optional, and what is environment-bound.

# README.md

Skills Master

skills-master 不是面向某个业务领域的 skill,也不是一个泛化的 agent 工具箱。

它的真正开发目的很简单:把“开发 skill 这件事”本身做成一个可复用的 skill。也就是说,当 agent 需要创建、重写、收缩、评估或治理其他 skills 时,这个 skill 提供方法、结构和配套资源。

这个仓库究竟是什么

这个仓库是 skills-master 这个 meta-skill 的源码与配套资源。

它服务的不是终端业务任务,而是下面这些“skill 开发任务”:

  1. 从零创建一个新 skill
  2. 重写已经过拟合、堆规则、越改越乱的 skill
  3. 清理失真的说明文档,让 README、SKILL.md、脚本能力重新一致
  4. 给 skill 建立评测、人工 review、对比基线和迭代闭环
  5. 优化 skill 的触发描述与结构组织
  6. 统一多个工具环境下的真实源、链接与分发方式

如果一句话概括:它是“用来开发和维护其他 skills 的 skill”。

它不是什么

为了避免继续写偏,这个仓库不应该被描述成:

  • 一个通用的多 Agent 框架
  • 一个以脚本集合为核心的 automation 仓库
  • 一个主要目标是打包 .skill 文件的发布工具
  • 一个围绕某个平台历史遗留语义展开的兼容层

脚本、viewer、reference 文档都只是配套资源。主语始终应该是这个 skill 本身,以及它如何帮助 agent 更好地开发别的 skills。

核心方法

skills-master 的核心方法不是“多写规则”,而是:

  1. 先识别当前任务属于创建、重写、文档校正、评测、触发优化还是分发
  2. 先修正仓库真实情况,再修 skill 内容,再补评测与分发
  3. 避免补丁式叠说明,尽量整段重写失真的 section
  4. 只在真的重复、脆弱、需要确定性时才把能力下沉到 scripts/
  5. 把通用方法和平台专用机制分开写清楚
  6. 用尽可能少但足够的结构,让 skill 可长期维护,而不是只对几个样例有效

仓库里的东西各自负责什么

SKILL.md

这是核心。它定义了 agent 在处理“skill 开发任务”时应该如何思考、分流、重写、评测和迭代。

scripts/

这些脚本不是仓库的目的,而是为 meta-skill 提供支撑:

  • 初始化 skill
  • 校验 skill 结构
  • 管理链接
  • 打包 skill
  • 聚合 benchmark
  • 在特定平台里做触发描述评测

references/

提供可按需读取的技能设计参考,例如工作流模式、输出模式、JSON 结构约定。

agents/

提供评分、对比、分析这类辅助角色说明,用于评测和迭代阶段。

eval-viewer/

提供人工 review 界面生成能力,帮助人类快速查看不同迭代的输出与 benchmark。

这个 skill 的主路径

如果按真正用途来理解,本项目的默认使用路径应该是:

  1. 识别一个已有或待创建的 skill 是否需要被重做
  2. 阅读或改写它的 SKILL.md
  3. 决定哪些内容该留在主文档,哪些该下沉到 scripts/references/assets/
  4. 必要时补评测集和人工 review 流程
  5. 在确认结构稳定后,再考虑触发优化、链接和打包

也就是说,先有 skill 设计与治理,后有脚本与发布动作。不要把顺序倒过来。

给人的入口

如果你是人类维护者,先看这几个文件:

  1. SKILL.md
  2. references/workflows.md
  3. references/output-patterns.md
  4. references/schemas.md

其中:

  • README.md 解释这个仓库为什么存在
  • SKILL.md 才是 agent 真正执行时要遵循的说明

配套脚本

只有在需要时再使用这些脚本,不要把它们当成项目目的:

python3 scripts/init_skill.py my-skill --path ~/.agents/skills
python3 -m scripts.quick_validate /path/to/my-skill
python3 scripts/link_skill.py /path/to/my-skill --status
python3 -m scripts.package_skill /path/to/my-skill
python3 -m scripts.aggregate_benchmark /path/to/workspace/iteration-1 --skill-name my-skill

依赖边界

通用能力

下面这些能力是本仓库的常规部分:

  • skill 结构设计
  • 文档重写
  • 初始化、校验、链接、打包、benchmark 聚合

最小依赖:

  • Python 3.9+
  • PyYAML
python3 -m pip install pyyaml

平台专用增强能力

触发描述优化相关脚本目前仍然依赖 Anthropic / Claude 生态:

  • anthropic
  • claude CLI
  • 对应认证环境
python3 -m pip install anthropic
python3 -m scripts.run_eval --help
python3 -m scripts.run_loop --help

如果这部分环境不存在,不影响本仓库作为 meta-skill 的主要用途。

当前状态

这个仓库现在已经完成一轮从“继承型文档”到“按真实目的重述”的整理,但仍有一些明确边界:

  • 还没有统一的 requirements.txtpyproject.toml
  • 部分脚本更适合以 python3 -m scripts.<name> 方式运行
  • 触发优化链路仍然不是跨平台实现

这些都是配套层面的限制,不影响本项目作为“skill 开发 skill”的主定位。

版本

当前版本说明见 VERSION.md

# Supported AI Coding Agents

This skill is compatible with the SKILL.md standard and works with all major AI coding agents:

Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.