grahama1970

battle

0
0
# Install this skill:
npx skills add grahama1970/agent-skills --skill "battle"

Install specific skill from multi-skill repository

# Description

>

# SKILL.md


name: battle
description: >
Red vs Blue team security competition orchestrator. Runs long-running overnight
battles with 1000s of interactions, scoring, and insight generation.
allowed-tools:
- Bash
- Read
triggers:
- battle
- thunderdome
- red vs blue
- overnight battle
- security competition
- red team vs blue team
metadata:
short-description: Red vs Blue team security competition
requires: docker


Battle Skill

Red vs Blue Team Security Competition Orchestrator

Pits a Red Team (attack) against a Blue Team (defense) in a long-running competitive loop. Each team leverages all .pi/skills to attack or defend a target codebase.

Architecture

Based on research into RvB framework, DARPA AIxCC, and Microsoft PyRIT:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                 Battle Orchestrator                      β”‚
β”‚  - Game loop (RvB pattern)                              β”‚
β”‚  - Concurrent Red/Blue execution                        β”‚
β”‚  - Entropy-driven termination                           β”‚
β”‚  - Checkpointing for overnight runs                     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚                              β”‚
    β”Œβ”€β”€β”€β”€β”΄β”€β”€β”€β”€β”                    β”Œβ”€β”€β”€β”€β”΄β”€β”€β”€β”€β”
    β”‚ Red Team β”‚                   β”‚ Blue Teamβ”‚
    β”‚ (Thread) β”‚                   β”‚ (Thread) β”‚
    β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€                   β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
    β”‚ Skills:  β”‚                   β”‚ Skills:  β”‚
    β”‚ - hack   β”‚                   β”‚ - anvil  β”‚
    β”‚ - memory β”‚                   β”‚ - memory β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚                              β”‚
         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                    β”‚
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚           Digital Twin              β”‚
    β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
    β”‚  β”‚ Mode: git_worktree          β”‚   β”‚
    β”‚  β”‚   - Red attacks arena       β”‚   β”‚
    β”‚  β”‚   - Blue patches workspace  β”‚   β”‚
    β”‚  β”‚   - Cherry-pick to test     β”‚   β”‚
    β”‚  β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€   β”‚
    β”‚  β”‚ Mode: docker                β”‚   β”‚
    β”‚  β”‚   - Isolated containers     β”‚   β”‚
    β”‚  β”‚   - Battle network          β”‚   β”‚
    β”‚  β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€   β”‚
    β”‚  β”‚ Mode: qemu                  β”‚   β”‚
    β”‚  β”‚   - Emulated firmware       β”‚   β”‚
    β”‚  β”‚   - GDB attach points       β”‚   β”‚
    β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Digital Twin Modes

The battle skill supports multiple target types through its Digital Twin system:

1. Source Code (git_worktree)

For battling over git repositories. Creates isolated git worktrees for each team.

./run.sh battle /path/to/repo --rounds 100

2. Docker Container (docker)

For battling over containerized applications. Spins up separate containers for each team.

# Using a Docker image
./run.sh battle --docker-image nginx:latest --rounds 100

# Using a Dockerfile in the target directory
./run.sh battle /path/with/Dockerfile --mode docker

3. Firmware/Microprocessor (qemu)

For battling over firmware and embedded systems. Boots firmware in QEMU emulator.

# Auto-detect architecture from ELF header
./run.sh battle firmware.elf --rounds 100

# Specify machine type explicitly
./run.sh battle firmware.bin --qemu-machine arm
./run.sh battle firmware.bin --qemu-machine riscv64
./run.sh battle bios.rom --qemu-machine x86_64

Supported QEMU machines:
- arm - ARM Cortex-M (STM32, etc.)
- aarch64 - ARM64
- riscv32/riscv64 - RISC-V
- x86_64/i386 - x86
- mips - MIPS (routers, embedded)

4. Copy Mode (fallback)

For non-git directories. Creates simple file copies for each team.

Commands

# Start a battle (10 rounds for testing)
./run.sh battle /path/to/codebase --rounds 10

# Start overnight battle (1000 rounds)
./run.sh battle /path/to/codebase --overnight

# Battle a Docker container
./run.sh battle --docker-image myapp:latest --rounds 100

# Battle firmware with QEMU
./run.sh battle firmware.bin --qemu-machine arm --rounds 100

# Check battle status
./run.sh status

# Resume interrupted battle
./run.sh resume <battle-id>

# Generate report from completed battle
./run.sh report <battle-id>

Scoring System (AIxCC-style)

Metric Weight Description
Vulnerability Discovery 1x Red team finds vulnerability
Exploit Proof +0.5x Red team proves exploitability
Successful Patch 3x Blue team patches vulnerability
Time Decay Variable Faster responses score higher
Functionality Preserved Required Patches must not break code

Scores

  • TDSR (True Defense Success Rate): Vulnerabilities fixed AND code works
  • FDSR (Fake Defense Success Rate): Attack blocked but code broken
  • ASC (Attack Success Count): Total unique exploits discovered

Game Loop (Learning-Based)

Each round follows a learn β†’ act β†’ reflect pattern:

Round k:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    1. RESEARCH PHASE                         β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Red Team:                      Blue Team:                    β”‚
β”‚ - Recall past attack attempts  - Recall past defenses        β”‚
β”‚ - Query /dogpile for new       - Query /dogpile for          β”‚
β”‚   exploitation techniques        hardening strategies        β”‚
β”‚ - Review opponent's patterns   - Analyze attack evolution    β”‚
β”‚ (Budget: 3 research calls max)                               β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    2. ACTION PHASE                           β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Red Team Attack:               Blue Team Defense:            β”‚
β”‚ - Execute learned strategy     - Apply patches via anvil     β”‚
β”‚ - AFL++ fuzzing with coverage  - Verify via QCOW2 overlay    β”‚
β”‚ - Collect crashes/findings     - Run regression tests        β”‚
β”‚ - Tag findings with /taxonomy  - Tag patches with /taxonomy  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                   3. REFLECTION PHASE                        β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Both Teams:                                                  β”‚
β”‚ - Archive round episode (actions, outcomes, learnings)       β”‚
β”‚ - Store successful strategies in /memory                     β”‚
β”‚ - Update belief about opponent's capabilities                β”‚
β”‚ - Evolve strategy for next round                            β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                   4. SCORING & CHECKPOINT                    β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ - Calculate AIxCC-style scores                               β”‚
β”‚ - Check termination conditions                               β”‚
β”‚ - Save checkpoint (QEMU state + team memories)              β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Memory Architecture

Each team maintains isolated knowledge:

battle_red_<battle_id>/           battle_blue_<battle_id>/
β”œβ”€β”€ strategies/                   β”œβ”€β”€ strategies/
β”‚   β”œβ”€β”€ successful_attacks        β”‚   β”œβ”€β”€ successful_patches
β”‚   └── failed_attempts           β”‚   └── broken_defenses
β”œβ”€β”€ research/                     β”œβ”€β”€ research/
β”‚   └── dogpile_results           β”‚   └── dogpile_results
β”œβ”€β”€ episodes/                     β”œβ”€β”€ episodes/
β”‚   β”œβ”€β”€ round_001.json            β”‚   β”œβ”€β”€ round_001.json
β”‚   └── round_002.json            β”‚   └── round_002.json
└── taxonomy/                     └── taxonomy/
    β”œβ”€β”€ cwe_classifications       β”œβ”€β”€ mitigation_types
    └── severity_scores           └── effectiveness_scores

Teams cannot access opponent's memory - this creates true adversarial learning.

Termination Conditions

Battle ends when ANY condition is met:

  1. Null Production: Both teams fail to generate new findings for 3 rounds
  2. Maximum Rounds: Configured limit reached
  3. Metric Convergence: Scores stable for 5 consecutive rounds
  4. Kill Switch: Manual termination via ./run.sh stop

Task Monitor Integration

Battles register with task-monitor for overnight progress tracking:

# View battle progress in TUI
.pi/skills/task-monitor/run.sh tui --filter battle

Report Output

After battle completion, generates:

  • Executive Summary: Winner, key metrics, risk score
  • Vulnerability Report: By severity, category, remediation status
  • Attack Evolution: How Red team adapted over rounds
  • Defense Timeline: Blue team improvements over time
  • Recommendations: Prioritized security improvements

Leveraged Skills

Skill Team Purpose
hack Red Scanning, auditing, exploitation
anvil Blue Multi-agent patching (Thunderdome)
memory Both Recall prior strategies
treesitter Blue Code structure analysis
taxonomy Both Classify findings
task-monitor Orchestrator Progress tracking
docker-ops Both Container management

Example Battle

# Start 100-round battle on current project
./run.sh battle --target . --rounds 100

# Output:
# Battle ID: battle_20250128_221500
# Target: /home/user/project
# Rounds: 100
#
# Registering with task-monitor...
# Starting Round 1/100...
# [Red] Scanning target with hack...
# [Red] Found 3 potential vulnerabilities
# [Blue] Analyzing attack logs...
# [Blue] Generating patch for SQL injection...
# [Blue] Patch applied, running verification...
# Round 1 complete. Red: 3 pts, Blue: 9 pts
# ...
#
# Battle Complete!
# Winner: Blue Team (847 pts vs 423 pts)
# Report: ./reports/battle_20250128_221500.md

# Supported AI Coding Agents

This skill is compatible with the SKILL.md standard and works with all major AI coding agents:

Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.