clawdbotborges

upskill

0
0
# Install this skill:
npx skills add clawdbotborges/upskill-skill

Or install specific skill: npx add-skill https://github.com/clawdbotborges/upskill-skill

# Description

Generate, evaluate, and iterate on agent skills using HuggingFace's Upskill tool. Transfer domain expertise from frontier models to smaller/local models.

# SKILL.md


name: upskill
description: Generate, evaluate, and iterate on agent skills using HuggingFace's Upskill tool. Transfer domain expertise from frontier models to smaller/local models.
homepage: https://github.com/huggingface/upskill


Upskill β€” Agent Skill Generator & Evaluator

Generate validated agent skills with large models and deploy them on smaller, cheaper, or local models.

Quick Start

# Install
pip install upskill
# or one-off
uvx upskill --help

# Set API keys
export ANTHROPIC_API_KEY=sk-ant-...
export HF_TOKEN=hf_...

Core Commands

Generate a Skill

# From a task description (Opus generates by default)
upskill generate "build optimized CUDA kernels for PyTorch"

# From an agent trace (exported conversation)
upskill generate "write kernels" --from ./trace.md

# Iterate on an existing skill
upskill generate "add error handling and edge cases" \
  --from ./skills/my-skill/

# Generate with specific teacher, evaluate on local student
upskill generate "parse YAML configs" \
  --model opus \
  --eval-model "unsloth/GLM-4.7-Flash-GGUF:Q4_0" \
  --eval-base-url http://localhost:8080/v1

Evaluate a Skill

# Evaluate on cloud models
upskill eval ./skills/my-skill/ \
  --model haiku --model sonnet

# Evaluate on local model (llama.cpp server)
upskill eval ./skills/my-skill/ \
  --model "unsloth/GLM-4.7-Flash-GGUF:Q4_0" \
  --base-url http://localhost:8080/v1

# Multiple runs for statistical confidence
upskill eval ./skills/my-skill/ \
  --model haiku --model kimi --runs 5

How It Works

  1. Teacher model (Opus/Sonnet) generates the skill from a task description or trace
  2. Test cases are auto-generated from the task
  3. Baseline is measured: model without skill
  4. With-skill is measured: model with SKILL.md injected
  5. Skill lift = accuracy improvement + token usage change
  6. If insufficient improvement, the tool iterates automatically

Output Structure

./skills/<skill-name>/
β”œβ”€β”€ SKILL.md          # Main instructions (~500 tokens)
└── skill_meta.json   # Metadata and test cases

SKILL.md Format

---
name: my-skill
description: What this skill teaches the agent.
---

# Skill Title

## Overview
Brief description of the domain knowledge.

## Key Concepts
- Concept 1: explanation
- Concept 2: explanation

## Examples
Code examples, patterns, configurations.

## Common Pitfalls
What to avoid and why.

Test Cases Format (skill_meta.json)

{
  "cases": [
    {
      "input": "Create a build.toml for H100",
      "expected": {"contains": "9.0"}
    },
    {
      "input": "Write a CUDA kernel template",
      "expected": {"contains": "cuda_runtime.h"}
    }
  ]
}

Evaluation Output

Generating skill with sonnet...
Generating test cases...
Evaluating on sonnet... (attempt 1)
 60% -> 95% (+35%) OK

 my-skill
 SKILL.md ~520 tokens

 baseline   β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘ 60%
 with skill β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‘ 95% (+35%)

Saved to ./skills/my-skill

Cross-Model Comparison

┃ Model ┃ Pass Rate  ┃ Avg Assertions ┃ Avg Tokens ┃
β”‚ haiku β”‚ 4/5 (80%)  β”‚ 2.8/3          β”‚ 1250       β”‚
β”‚ kimi  β”‚ 5/5 (100%) β”‚ 3.0/3          β”‚ 1890       β”‚

Best Practices

  • Use expensive models as teachers β€” Opus/GPT-5 for generation, Haiku/local for evaluation
  • Always evaluate per-model β€” a skill that helps one model may not help another
  • Measure both axes β€” accuracy AND token usage matter
  • Iterate β€” if lift is insufficient, refine the skill with --from
  • Keep skills focused β€” ~500 tokens is ideal; don't bloat with unnecessary info
  • Skills are for hard/specialized tasks β€” don't create skills for things models already do well
  • Version control skills β€” they're just files, treat them like code

Using Skills with Agent Tools

Skills follow the Agent Skills specification and work with:

Tool Skill Location
Claude Code .claude/skills/{name}/SKILL.md
Codex .codex/skills/{name}/SKILL.md
Cursor .cursor/skills/{name}/SKILL.md
OpenCode .opencode/skills/{name}/SKILL.md
Clawdbot skills/{name}/SKILL.md

Simply copy the generated skill directory to the appropriate location.

Common Workflows

Transfer Knowledge to Local Models

# 1. Start local model server
llama-server -hf unsloth/GLM-4.7-Flash-GGUF:Q4_K_M

# 2. Generate skill with Opus, evaluate on local
upskill generate "your specialized task" \
  --model opus \
  --eval-model "unsloth/GLM-4.7-Flash-GGUF:Q4_0" \
  --eval-base-url http://localhost:8080/v1

# 3. If lift is good, deploy the skill
cp -r ./skills/my-skill/ ~/.claude/skills/

Build a Skill from Existing Agent Trace

# Export trace from Claude Code, Cursor, etc.
# Then generate a skill from it
upskill generate "the task description" --from ./trace.md

# Evaluate on target models
upskill eval ./skills/my-skill/ --model haiku --model sonnet

Iterate on a Skill

# Start from existing skill, add improvements
upskill generate "add error handling for edge cases" \
  --from ./skills/my-skill/

# Re-evaluate to confirm improvement
upskill eval ./skills/my-skill/ --model haiku --runs 5

Resources

  • Repo: https://github.com/huggingface/upskill
  • Blog: https://huggingface.co/blog/upskill
  • Agent Skills Spec: https://agentskills.io
  • Example Skill: https://huggingface.co/hf-skills/h100-diffusers-kernel-builder

# Supported AI Coding Agents

This skill is compatible with the SKILL.md standard and works with all major AI coding agents:

Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.