grahama1970

battle

0
0
# Install this skill:
npx skills add grahama1970/agent-skills --skill "battle"

Install specific skill from multi-skill repository

# Description

>

# SKILL.md


name: battle
description: >
Red vs Blue team security competition orchestrator. Runs long-running overnight
battles with 1000s of interactions, scoring, and insight generation.
allowed-tools:
- Bash
- Read
triggers:
- battle
- thunderdome
- red vs blue
- overnight battle
- security competition
- red team vs blue team
metadata:
short-description: Red vs Blue team security competition
requires: docker


Battle Skill

Red vs Blue Team Security Competition Orchestrator

Pits a Red Team (attack) against a Blue Team (defense) in a long-running competitive loop. Each team leverages all .pi/skills to attack or defend a target codebase.

Architecture

Based on research into RvB framework, DARPA AIxCC, and Microsoft PyRIT:

┌─────────────────────────────────────────────────────────┐
│                 Battle Orchestrator                      │
│  - Game loop (RvB pattern)                              │
│  - Concurrent Red/Blue execution                        │
│  - Entropy-driven termination                           │
│  - Checkpointing for overnight runs                     │
└─────────────────────────────────────────────────────────┘
         │                              │
    ┌────┴────┐                    ┌────┴────┐
    │ Red Team │                   │ Blue Team│
    │ (Thread) │                   │ (Thread) │
    ├──────────┤                   ├──────────┤
    │ Skills:  │                   │ Skills:  │
    │ - hack   │                   │ - anvil  │
    │ - memory │                   │ - memory │
    └──────────┘                   └──────────┘
         │                              │
         └──────────┬───────────────────┘
                    │
    ┌───────────────┴────────────────────┐
    │           Digital Twin              │
    │  ┌─────────────────────────────┐   │
    │  │ Mode: git_worktree          │   │
    │  │   - Red attacks arena       │   │
    │  │   - Blue patches workspace  │   │
    │  │   - Cherry-pick to test     │   │
    │  ├─────────────────────────────┤   │
    │  │ Mode: docker                │   │
    │  │   - Isolated containers     │   │
    │  │   - Battle network          │   │
    │  ├─────────────────────────────┤   │
    │  │ Mode: qemu                  │   │
    │  │   - Emulated firmware       │   │
    │  │   - GDB attach points       │   │
    │  └─────────────────────────────┘   │
    └────────────────────────────────────┘

Digital Twin Modes

The battle skill supports multiple target types through its Digital Twin system:

1. Source Code (git_worktree)

For battling over git repositories. Creates isolated git worktrees for each team.

./run.sh battle /path/to/repo --rounds 100

2. Docker Container (docker)

For battling over containerized applications. Spins up separate containers for each team.

# Using a Docker image
./run.sh battle --docker-image nginx:latest --rounds 100

# Using a Dockerfile in the target directory
./run.sh battle /path/with/Dockerfile --mode docker

3. Firmware/Microprocessor (qemu)

For battling over firmware and embedded systems. Boots firmware in QEMU emulator.

# Auto-detect architecture from ELF header
./run.sh battle firmware.elf --rounds 100

# Specify machine type explicitly
./run.sh battle firmware.bin --qemu-machine arm
./run.sh battle firmware.bin --qemu-machine riscv64
./run.sh battle bios.rom --qemu-machine x86_64

Supported QEMU machines:
- arm - ARM Cortex-M (STM32, etc.)
- aarch64 - ARM64
- riscv32/riscv64 - RISC-V
- x86_64/i386 - x86
- mips - MIPS (routers, embedded)

4. Copy Mode (fallback)

For non-git directories. Creates simple file copies for each team.

Commands

# Start a battle (10 rounds for testing)
./run.sh battle /path/to/codebase --rounds 10

# Start overnight battle (1000 rounds)
./run.sh battle /path/to/codebase --overnight

# Battle a Docker container
./run.sh battle --docker-image myapp:latest --rounds 100

# Battle firmware with QEMU
./run.sh battle firmware.bin --qemu-machine arm --rounds 100

# Check battle status
./run.sh status

# Resume interrupted battle
./run.sh resume <battle-id>

# Generate report from completed battle
./run.sh report <battle-id>

Scoring System (AIxCC-style)

Metric Weight Description
Vulnerability Discovery 1x Red team finds vulnerability
Exploit Proof +0.5x Red team proves exploitability
Successful Patch 3x Blue team patches vulnerability
Time Decay Variable Faster responses score higher
Functionality Preserved Required Patches must not break code

Scores

  • TDSR (True Defense Success Rate): Vulnerabilities fixed AND code works
  • FDSR (Fake Defense Success Rate): Attack blocked but code broken
  • ASC (Attack Success Count): Total unique exploits discovered

Game Loop (Learning-Based)

Each round follows a learn → act → reflect pattern:

Round k:

┌─────────────────────────────────────────────────────────────┐
│                    1. RESEARCH PHASE                         │
├─────────────────────────────────────────────────────────────┤
│ Red Team:                      Blue Team:                    │
│ - Recall past attack attempts  - Recall past defenses        │
│ - Query /dogpile for new       - Query /dogpile for          │
│   exploitation techniques        hardening strategies        │
│ - Review opponent's patterns   - Analyze attack evolution    │
│ (Budget: 3 research calls max)                               │
└─────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────┐
│                    2. ACTION PHASE                           │
├─────────────────────────────────────────────────────────────┤
│ Red Team Attack:               Blue Team Defense:            │
│ - Execute learned strategy     - Apply patches via anvil     │
│ - AFL++ fuzzing with coverage  - Verify via QCOW2 overlay    │
│ - Collect crashes/findings     - Run regression tests        │
│ - Tag findings with /taxonomy  - Tag patches with /taxonomy  │
└─────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────┐
│                   3. REFLECTION PHASE                        │
├─────────────────────────────────────────────────────────────┤
│ Both Teams:                                                  │
│ - Archive round episode (actions, outcomes, learnings)       │
│ - Store successful strategies in /memory                     │
│ - Update belief about opponent's capabilities                │
│ - Evolve strategy for next round                            │
└─────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────┐
│                   4. SCORING & CHECKPOINT                    │
├─────────────────────────────────────────────────────────────┤
│ - Calculate AIxCC-style scores                               │
│ - Check termination conditions                               │
│ - Save checkpoint (QEMU state + team memories)              │
└─────────────────────────────────────────────────────────────┘

Memory Architecture

Each team maintains isolated knowledge:

battle_red_<battle_id>/           battle_blue_<battle_id>/
├── strategies/                   ├── strategies/
│   ├── successful_attacks        │   ├── successful_patches
│   └── failed_attempts           │   └── broken_defenses
├── research/                     ├── research/
│   └── dogpile_results           │   └── dogpile_results
├── episodes/                     ├── episodes/
│   ├── round_001.json            │   ├── round_001.json
│   └── round_002.json            │   └── round_002.json
└── taxonomy/                     └── taxonomy/
    ├── cwe_classifications       ├── mitigation_types
    └── severity_scores           └── effectiveness_scores

Teams cannot access opponent's memory - this creates true adversarial learning.

Termination Conditions

Battle ends when ANY condition is met:

  1. Null Production: Both teams fail to generate new findings for 3 rounds
  2. Maximum Rounds: Configured limit reached
  3. Metric Convergence: Scores stable for 5 consecutive rounds
  4. Kill Switch: Manual termination via ./run.sh stop

Task Monitor Integration

Battles register with task-monitor for overnight progress tracking:

# View battle progress in TUI
.pi/skills/task-monitor/run.sh tui --filter battle

Report Output

After battle completion, generates:

  • Executive Summary: Winner, key metrics, risk score
  • Vulnerability Report: By severity, category, remediation status
  • Attack Evolution: How Red team adapted over rounds
  • Defense Timeline: Blue team improvements over time
  • Recommendations: Prioritized security improvements

Leveraged Skills

Skill Team Purpose
hack Red Scanning, auditing, exploitation
anvil Blue Multi-agent patching (Thunderdome)
memory Both Recall prior strategies
treesitter Blue Code structure analysis
taxonomy Both Classify findings
task-monitor Orchestrator Progress tracking
docker-ops Both Container management

Example Battle

# Start 100-round battle on current project
./run.sh battle --target . --rounds 100

# Output:
# Battle ID: battle_20250128_221500
# Target: /home/user/project
# Rounds: 100
#
# Registering with task-monitor...
# Starting Round 1/100...
# [Red] Scanning target with hack...
# [Red] Found 3 potential vulnerabilities
# [Blue] Analyzing attack logs...
# [Blue] Generating patch for SQL injection...
# [Blue] Patch applied, running verification...
# Round 1 complete. Red: 3 pts, Blue: 9 pts
# ...
#
# Battle Complete!
# Winner: Blue Team (847 pts vs 423 pts)
# Report: ./reports/battle_20250128_221500.md

# Supported AI Coding Agents

This skill is compatible with the SKILL.md standard and works with all major AI coding agents:

Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.