upskill

by @clawdbotborges in AI & LLM

# Install this skill:

npx skills add clawdbotborges/upskill-skill

Or install specific skill: npx add-skill https://github.com/clawdbotborges/upskill-skill

# Description

Generate, evaluate, and iterate on agent skills using HuggingFace's Upskill tool. Transfer domain expertise from frontier models to smaller/local models.

# SKILL.md

name: upskill
description: Generate, evaluate, and iterate on agent skills using HuggingFace's Upskill tool. Transfer domain expertise from frontier models to smaller/local models.
homepage: https://github.com/huggingface/upskill

Upskill — Agent Skill Generator & Evaluator

Generate validated agent skills with large models and deploy them on smaller, cheaper, or local models.

Quick Start

# Install
pip install upskill
# or one-off
uvx upskill --help

# Set API keys
export ANTHROPIC_API_KEY=sk-ant-...
export HF_TOKEN=hf_...

Core Commands

Generate a Skill

# From a task description (Opus generates by default)
upskill generate "build optimized CUDA kernels for PyTorch"

# From an agent trace (exported conversation)
upskill generate "write kernels" --from ./trace.md

# Iterate on an existing skill
upskill generate "add error handling and edge cases" \
  --from ./skills/my-skill/

# Generate with specific teacher, evaluate on local student
upskill generate "parse YAML configs" \
  --model opus \
  --eval-model "unsloth/GLM-4.7-Flash-GGUF:Q4_0" \
  --eval-base-url http://localhost:8080/v1

Evaluate a Skill

# Evaluate on cloud models
upskill eval ./skills/my-skill/ \
  --model haiku --model sonnet

# Evaluate on local model (llama.cpp server)
upskill eval ./skills/my-skill/ \
  --model "unsloth/GLM-4.7-Flash-GGUF:Q4_0" \
  --base-url http://localhost:8080/v1

# Multiple runs for statistical confidence
upskill eval ./skills/my-skill/ \
  --model haiku --model kimi --runs 5

How It Works

Teacher model (Opus/Sonnet) generates the skill from a task description or trace
Test cases are auto-generated from the task
Baseline is measured: model without skill
With-skill is measured: model with SKILL.md injected
Skill lift = accuracy improvement + token usage change
If insufficient improvement, the tool iterates automatically

Output Structure

./skills/<skill-name>/
├── SKILL.md          # Main instructions (~500 tokens)
└── skill_meta.json   # Metadata and test cases

SKILL.md Format

---
name: my-skill
description: What this skill teaches the agent.
---

# Skill Title

## Overview
Brief description of the domain knowledge.

## Key Concepts
- Concept 1: explanation
- Concept 2: explanation

## Examples
Code examples, patterns, configurations.

## Common Pitfalls
What to avoid and why.

Test Cases Format (skill_meta.json)

{
  "cases": [
    {
      "input": "Create a build.toml for H100",
      "expected": {"contains": "9.0"}
    },
    {
      "input": "Write a CUDA kernel template",
      "expected": {"contains": "cuda_runtime.h"}
    }
  ]
}

Evaluation Output

Generating skill with sonnet...
Generating test cases...
Evaluating on sonnet... (attempt 1)
 60% -> 95% (+35%) OK

 my-skill
 SKILL.md ~520 tokens

 baseline   ████████████░░░░░░░░ 60%
 with skill ███████████████████░ 95% (+35%)

Saved to ./skills/my-skill

Cross-Model Comparison

┃ Model ┃ Pass Rate  ┃ Avg Assertions ┃ Avg Tokens ┃
│ haiku │ 4/5 (80%)  │ 2.8/3          │ 1250       │
│ kimi  │ 5/5 (100%) │ 3.0/3          │ 1890       │

Best Practices

Use expensive models as teachers — Opus/GPT-5 for generation, Haiku/local for evaluation
Always evaluate per-model — a skill that helps one model may not help another
Measure both axes — accuracy AND token usage matter
Iterate — if lift is insufficient, refine the skill with --from
Keep skills focused — ~500 tokens is ideal; don't bloat with unnecessary info
Skills are for hard/specialized tasks — don't create skills for things models already do well
Version control skills — they're just files, treat them like code

Using Skills with Agent Tools

Skills follow the Agent Skills specification and work with:

Tool	Skill Location
Claude Code	`.claude/skills/{name}/SKILL.md`
Codex	`.codex/skills/{name}/SKILL.md`
Cursor	`.cursor/skills/{name}/SKILL.md`
OpenCode	`.opencode/skills/{name}/SKILL.md`
Clawdbot	`skills/{name}/SKILL.md`

Simply copy the generated skill directory to the appropriate location.

Common Workflows

Transfer Knowledge to Local Models

# 1. Start local model server
llama-server -hf unsloth/GLM-4.7-Flash-GGUF:Q4_K_M

# 2. Generate skill with Opus, evaluate on local
upskill generate "your specialized task" \
  --model opus \
  --eval-model "unsloth/GLM-4.7-Flash-GGUF:Q4_0" \
  --eval-base-url http://localhost:8080/v1

# 3. If lift is good, deploy the skill
cp -r ./skills/my-skill/ ~/.claude/skills/

Build a Skill from Existing Agent Trace

# Export trace from Claude Code, Cursor, etc.
# Then generate a skill from it
upskill generate "the task description" --from ./trace.md

# Evaluate on target models
upskill eval ./skills/my-skill/ --model haiku --model sonnet

Iterate on a Skill

# Start from existing skill, add improvements
upskill generate "add error handling for edge cases" \
  --from ./skills/my-skill/

# Re-evaluate to confirm improvement
upskill eval ./skills/my-skill/ --model haiku --runs 5

Resources

Repo: https://github.com/huggingface/upskill
Blog: https://huggingface.co/blog/upskill
Agent Skills Spec: https://agentskills.io
Example Skill: https://huggingface.co/hf-skills/h100-diffusers-kernel-builder

# Supported AI Coding Agents

This skill is compatible with the SKILL.md standard and works with all major AI coding agents:

⚡ Amp 🚀 Antigravity 🤖 Claude Code 🦀 Clawdbot 📝 Codex ▶️ Cursor 🤖 Droid 💎 Gemini CLI 🐙 GitHub Copilot 🪿 Goose 📊 Kilo Code 🔧 Kiro CLI 💻 OpenCode 🦘 Roo Code 🌲 Trae 🏄 Windsurf

Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.