battle

by @grahama1970 in Tools

# Install this skill:

npx skills add grahama1970/agent-skills --skill "battle"

Install specific skill from multi-skill repository

# Description

# SKILL.md

name: battle
description: >
Red vs Blue team security competition orchestrator. Runs long-running overnight
battles with 1000s of interactions, scoring, and insight generation.
allowed-tools:
- Bash
- Read
triggers:
- battle
- thunderdome
- red vs blue
- overnight battle
- security competition
- red team vs blue team
metadata:
short-description: Red vs Blue team security competition
requires: docker

Battle Skill

Red vs Blue Team Security Competition Orchestrator

Pits a Red Team (attack) against a Blue Team (defense) in a long-running competitive loop. Each team leverages all .pi/skills to attack or defend a target codebase.

Architecture

Based on research into RvB framework, DARPA AIxCC, and Microsoft PyRIT:

┌─────────────────────────────────────────────────────────┐
│                 Battle Orchestrator                      │
│  - Game loop (RvB pattern)                              │
│  - Concurrent Red/Blue execution                        │
│  - Entropy-driven termination                           │
│  - Checkpointing for overnight runs                     │
└─────────────────────────────────────────────────────────┘
         │                              │
    ┌────┴────┐                    ┌────┴────┐
    │ Red Team │                   │ Blue Team│
    │ (Thread) │                   │ (Thread) │
    ├──────────┤                   ├──────────┤
    │ Skills:  │                   │ Skills:  │
    │ - hack   │                   │ - anvil  │
    │ - memory │                   │ - memory │
    └──────────┘                   └──────────┘
         │                              │
         └──────────┬───────────────────┘
                    │
    ┌───────────────┴────────────────────┐
    │           Digital Twin              │
    │  ┌─────────────────────────────┐   │
    │  │ Mode: git_worktree          │   │
    │  │   - Red attacks arena       │   │
    │  │   - Blue patches workspace  │   │
    │  │   - Cherry-pick to test     │   │
    │  ├─────────────────────────────┤   │
    │  │ Mode: docker                │   │
    │  │   - Isolated containers     │   │
    │  │   - Battle network          │   │
    │  ├─────────────────────────────┤   │
    │  │ Mode: qemu                  │   │
    │  │   - Emulated firmware       │   │
    │  │   - GDB attach points       │   │
    │  └─────────────────────────────┘   │
    └────────────────────────────────────┘

Digital Twin Modes

The battle skill supports multiple target types through its Digital Twin system:

1. Source Code (git_worktree)

For battling over git repositories. Creates isolated git worktrees for each team.

./run.sh battle /path/to/repo --rounds 100

2. Docker Container (docker)

For battling over containerized applications. Spins up separate containers for each team.

# Using a Docker image
./run.sh battle --docker-image nginx:latest --rounds 100

# Using a Dockerfile in the target directory
./run.sh battle /path/with/Dockerfile --mode docker

3. Firmware/Microprocessor (qemu)

For battling over firmware and embedded systems. Boots firmware in QEMU emulator.

# Auto-detect architecture from ELF header
./run.sh battle firmware.elf --rounds 100

# Specify machine type explicitly
./run.sh battle firmware.bin --qemu-machine arm
./run.sh battle firmware.bin --qemu-machine riscv64
./run.sh battle bios.rom --qemu-machine x86_64

Supported QEMU machines:
- arm - ARM Cortex-M (STM32, etc.)
- aarch64 - ARM64
- riscv32/riscv64 - RISC-V
- x86_64/i386 - x86
- mips - MIPS (routers, embedded)

4. Copy Mode (fallback)

For non-git directories. Creates simple file copies for each team.

Commands

# Start a battle (10 rounds for testing)
./run.sh battle /path/to/codebase --rounds 10

# Start overnight battle (1000 rounds)
./run.sh battle /path/to/codebase --overnight

# Battle a Docker container
./run.sh battle --docker-image myapp:latest --rounds 100

# Battle firmware with QEMU
./run.sh battle firmware.bin --qemu-machine arm --rounds 100

# Check battle status
./run.sh status

# Resume interrupted battle
./run.sh resume <battle-id>

# Generate report from completed battle
./run.sh report <battle-id>

Scoring System (AIxCC-style)

Metric	Weight	Description
Vulnerability Discovery	1x	Red team finds vulnerability
Exploit Proof	+0.5x	Red team proves exploitability
Successful Patch	3x	Blue team patches vulnerability
Time Decay	Variable	Faster responses score higher
Functionality Preserved	Required	Patches must not break code

Scores

TDSR (True Defense Success Rate): Vulnerabilities fixed AND code works
FDSR (Fake Defense Success Rate): Attack blocked but code broken
ASC (Attack Success Count): Total unique exploits discovered

Game Loop (Learning-Based)

Each round follows a learn → act → reflect pattern:

Round k:

┌─────────────────────────────────────────────────────────────┐
│                    1. RESEARCH PHASE                         │
├─────────────────────────────────────────────────────────────┤
│ Red Team:                      Blue Team:                    │
│ - Recall past attack attempts  - Recall past defenses        │
│ - Query /dogpile for new       - Query /dogpile for          │
│   exploitation techniques        hardening strategies        │
│ - Review opponent's patterns   - Analyze attack evolution    │
│ (Budget: 3 research calls max)                               │
└─────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────┐
│                    2. ACTION PHASE                           │
├─────────────────────────────────────────────────────────────┤
│ Red Team Attack:               Blue Team Defense:            │
│ - Execute learned strategy     - Apply patches via anvil     │
│ - AFL++ fuzzing with coverage  - Verify via QCOW2 overlay    │
│ - Collect crashes/findings     - Run regression tests        │
│ - Tag findings with /taxonomy  - Tag patches with /taxonomy  │
└─────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────┐
│                   3. REFLECTION PHASE                        │
├─────────────────────────────────────────────────────────────┤
│ Both Teams:                                                  │
│ - Archive round episode (actions, outcomes, learnings)       │
│ - Store successful strategies in /memory                     │
│ - Update belief about opponent's capabilities                │
│ - Evolve strategy for next round                            │
└─────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────┐
│                   4. SCORING & CHECKPOINT                    │
├─────────────────────────────────────────────────────────────┤
│ - Calculate AIxCC-style scores                               │
│ - Check termination conditions                               │
│ - Save checkpoint (QEMU state + team memories)              │
└─────────────────────────────────────────────────────────────┘

Memory Architecture

Each team maintains isolated knowledge:

battle_red_<battle_id>/           battle_blue_<battle_id>/
├── strategies/                   ├── strategies/
│   ├── successful_attacks        │   ├── successful_patches
│   └── failed_attempts           │   └── broken_defenses
├── research/                     ├── research/
│   └── dogpile_results           │   └── dogpile_results
├── episodes/                     ├── episodes/
│   ├── round_001.json            │   ├── round_001.json
│   └── round_002.json            │   └── round_002.json
└── taxonomy/                     └── taxonomy/
    ├── cwe_classifications       ├── mitigation_types
    └── severity_scores           └── effectiveness_scores

Teams cannot access opponent's memory - this creates true adversarial learning.

Termination Conditions

Battle ends when ANY condition is met:

Null Production: Both teams fail to generate new findings for 3 rounds
Maximum Rounds: Configured limit reached
Metric Convergence: Scores stable for 5 consecutive rounds
Kill Switch: Manual termination via ./run.sh stop

Task Monitor Integration

Battles register with task-monitor for overnight progress tracking:

# View battle progress in TUI
.pi/skills/task-monitor/run.sh tui --filter battle

Report Output

After battle completion, generates:

Executive Summary: Winner, key metrics, risk score
Vulnerability Report: By severity, category, remediation status
Attack Evolution: How Red team adapted over rounds
Defense Timeline: Blue team improvements over time
Recommendations: Prioritized security improvements

Leveraged Skills

Skill	Team	Purpose
hack	Red	Scanning, auditing, exploitation
anvil	Blue	Multi-agent patching (Thunderdome)
memory	Both	Recall prior strategies
treesitter	Blue	Code structure analysis
taxonomy	Both	Classify findings
task-monitor	Orchestrator	Progress tracking
docker-ops	Both	Container management

Example Battle

# Start 100-round battle on current project
./run.sh battle --target . --rounds 100

# Output:
# Battle ID: battle_20250128_221500
# Target: /home/user/project
# Rounds: 100
#
# Registering with task-monitor...
# Starting Round 1/100...
# [Red] Scanning target with hack...
# [Red] Found 3 potential vulnerabilities
# [Blue] Analyzing attack logs...
# [Blue] Generating patch for SQL injection...
# [Blue] Patch applied, running verification...
# Round 1 complete. Red: 3 pts, Blue: 9 pts
# ...
#
# Battle Complete!
# Winner: Blue Team (847 pts vs 423 pts)
# Report: ./reports/battle_20250128_221500.md

# Supported AI Coding Agents

This skill is compatible with the SKILL.md standard and works with all major AI coding agents:

⚡ Amp 🚀 Antigravity 🤖 Claude Code 🦀 Clawdbot 📝 Codex ▶️ Cursor 🤖 Droid 💎 Gemini CLI 🐙 GitHub Copilot 🪿 Goose 📊 Kilo Code 🔧 Kiro CLI 💻 OpenCode 🦘 Roo Code 🌲 Trae 🏄 Windsurf

Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.