Refactor high-complexity React components in Dify frontend. Use when `pnpm analyze-component...
npx skills add rokoss21/swarm-iosm
Or install specific skill: npx add-skill https://github.com/rokoss21/swarm-iosm
# Description
Orchestrate complex development with AUTOMATIC parallel subagent execution, continuous dispatch scheduling, dependency analysis, file conflict detection, and IOSM quality gates. Analyzes task dependencies, builds critical path, launches parallel background workers with lock management, monitors progress, auto-spawns from discoveries. Use for multi-file features, parallel implementation streams, automated task decomposition, brownfield refactoring, or when user mentions "parallel agents", "orchestrate", "swarm", "continuous dispatch", "automatic scheduling", "PRD", "quality gates", "decompose work", "Mixed/brownfield".
# SKILL.md
name: swarm-iosm
version: 2.1
description: Orchestrate complex development with AUTOMATIC parallel subagent execution, continuous dispatch scheduling, dependency analysis, file conflict detection, and IOSM quality gates. Analyzes task dependencies, builds critical path, launches parallel background workers with lock management, monitors progress, auto-spawns from discoveries. Use for multi-file features, parallel implementation streams, automated task decomposition, brownfield refactoring, or when user mentions "parallel agents", "orchestrate", "swarm", "continuous dispatch", "automatic scheduling", "PRD", "quality gates", "decompose work", "Mixed/brownfield".
user-invocable: true
allowed-tools: Read, Grep, Glob, Bash, Write, Edit, Task, AskUserQuestion, TodoWrite
Swarm Workflow (IOSM)
A structured workflow for complex development tasks that combines PRD-driven planning, parallel subagent execution, and IOSM (Improve→Optimize→Shrink→Modularize) quality gates.
Quick Start
For new features/projects (Greenfield):
/swarm-iosm new-track "Add user authentication with JWT"
For existing codebases (Brownfield):
/swarm-iosm setup
/swarm-iosm new-track "Refactor payment processing module"
Check progress:
/swarm-iosm status
When to Use This Skill
Use Swarm Workflow when:
- Task requires multiple parallel work streams (exploration, implementation, testing, docs)
- Need formal PRD and decomposition for complex features
- Want structured reports and traceability ("who did what and why")
- Brownfield refactoring that needs careful planning and rollback strategy
- Team collaboration requiring artifact-based handoffs
- Quality gates (IOSM) are needed for acceptance
Don't use for:
- Simple single-file changes
- Quick bug fixes
- Exploratory tasks without implementation
Core Commands
/swarm-iosm setup
Initialize project context for Swarm workflow.
What it does:
1. Creates swarm/ directory structure
2. Generates project context files (product.md, tech-stack.md, workflow.md)
3. Initializes tracks.md registry
When to use: First time in a project, or when project context has significantly changed.
/swarm-iosm new-track "<description>"
Create a new feature/task track with PRD and implementation plan.
What it does:
1. Requirements gathering (AskUserQuestion for mode/priorities/constraints)
2. Generate PRD (swarm/tracks/<id>/PRD.md)
3. Create spec (spec.md) and plan (plan.md) with phases/tasks/dependencies
4. Identify subagent roles needed
5. Create metadata.json with track info
Arguments: Brief description of the feature/task (e.g., "Add OAuth2 authentication")
/swarm-iosm implement [track-id]
Execute the implementation plan using parallel subagents.
What it does:
1. Load plan from track
2. Identify parallelizable tasks vs. sequential chains
3. Launch subagents (suggests background for long-running, foreground for interactive)
4. Each subagent produces structured report in reports/
5. Monitor progress and collect outputs
Arguments: Optional track-id (defaults to most recent track)
/swarm-iosm status [track-id]
Show progress summary for a track.
What it does:
1. Parse plan.md for task statuses
2. List completed reports
3. Show blockers and open questions
4. Display dependency chain status
/swarm-iosm watch [track-id]
Open a live monitoring dashboard for a track. (v1.3)
What it does:
1. Calculates real-time metrics (velocity, ETA, progress %)
2. Renders an ASCII progress bar
3. Shows status of all tasks in the track
4. Refreshes data from reports and checkpoints
Example usage:
/swarm-iosm watch
/swarm-iosm simulate [track-id]
Run a dry-run simulation of the implementation plan. (v1.3)
What it does:
1. Loads implementation plan and resource constraints
2. Simulates dispatch loop with virtual time
3. Identifies bottlenecks and potential conflicts
4. Generates ASCII timeline and simulation report
5. Estimates total parallel execution time vs serial
Example usage:
/swarm-iosm simulate
/swarm-iosm simulate 2026-01-17-001
/swarm-iosm resume [track-id]
Resume an interrupted implementation from the latest checkpoint. (v1.3)
What it does:
1. Loads latest checkpoint from checkpoints/latest.json
2. Reconciles state by reading all report files in reports/
3. Identifies completed vs pending tasks
4. Recalculates the ready queue
5. Shows a summary of progress and next steps
Example usage:
/swarm-iosm resume
/swarm-iosm resume 2026-01-17-001
/swarm-iosm retry <task-id> [--foreground] [--reset-brief]
Retry a failed task with optional mode changes. (v1.2)
What it does:
1. Reads error diagnosis from task report using parse_errors.py
2. Shows error diagnosis to user with suggested fixes
3. Asks user to choose: apply fix, manual fix, or skip
4. Regenerates subagent brief with error context
5. Relaunches task using Task tool
6. Tracks retry count (max 3 per task)
Arguments:
- <task-id>: Task to retry (e.g., T04)
- --foreground: Force foreground execution (for interactive debugging)
- --reset-brief: Regenerate brief from scratch (vs. reuse existing)
Error-specific behaviors:
- Permission Denied: Always suggest --foreground
- MCP Tool Unavailable: Force foreground mode
- Import Error: Suggest pip install before retry
- Test Failed: Ask user: "Fix code or update tests?"
Example usage:
/swarm-iosm retry T04
/swarm-iosm retry T04 --foreground
/swarm-iosm retry T04 --reset-brief
Inter-Agent Communication (v2.0)
Subagents can share knowledge via shared_context.md.
Protocol:
1. Subagent discovers a pattern (e.g., "Use schemas.py for all models").
2. Subagent writes to "Shared Context Updates" in their report.
3. Orchestrator runs merge_context.py to update shared_context.md.
4. Subsequent subagents read shared_context.md in their brief.
Example Report Update:
## Shared Context Updates
- [Error Handling]: Always wrap API calls in `try/except ApiError`.
/swarm-iosm integrate <track-id>
Collect subagent reports and create integration plan.
What it does:
1. Read all reports from swarm/tracks/<id>/reports/
2. Identify conflicts and resolution strategy
3. Generate integration_report.md with merge order
4. Run IOSM quality gates
5. Create iosm_report.md with gate results and IOSM-Index
/swarm-iosm revert-plan <track-id>
Generate rollback guide for a track (does not execute git revert).
What it does:
1. Analyze files touched (from reports)
2. Identify commits/changes to revert
3. Suggest checkpoint/branch strategy
4. Create rollback_guide.md with manual steps
Advanced Features (v2.0)
Task Dependencies Visualization (--graph)
Generate a Mermaid diagram of the task dependency graph.
Usage:
/swarm-iosm simulate --graph
Generates dependency_graph.mermaid.
Anti-Pattern Detection
The planner automatically checks for:
- Monolithic tasks (XL + many touches)
- Low parallelism (<1.2x speedup)
- Missing quality gates
- Circular dependencies
Warnings appear in simulate and validate output.
Template Customization
You can override standard templates by placing files in swarm/templates/.
Resolution Order:
1. swarm/templates/<name> (Project-specific)
2. .claude/skills/swarm-iosm/templates/<name> (Skill defaults)
Supported Templates:
- prd.md, plan.md, subagent_brief.md, subagent_report.md
Resource Constraints & Cost Control
Define limits in plan.md or metadata to prevent overload.
Defaults:
- Max Parallel Background: 6
- Max Parallel Foreground: 2
- Max Total: 8
- Cost Limit: $10.00
Model Selection:
- Auto-select: Haiku (read-only), Sonnet (standard), Opus (security/arch).
Instructions for Claude
ORCHESTRATOR RESPONSIBILITIES
CRITICAL: The main agent (Claude) acts as ORCHESTRATOR ONLY. You coordinate subagents but DO NOT do implementation work yourself.
MANDATORY RULES
✅ ORCHESTRATOR DOES:
- Analyze & Plan
- Parse
plan.mdand build dependency graph - Generate
orchestration_plan.mdwith waves/critical path -
Detect file conflicts and resolve scheduling
-
Launch Subagents
- Create detailed briefs for each subagent (using templates)
- Launch parallel waves in single message (multiple Task tool calls)
- Default to background mode (unless interactive)
-
Pre-resolve all questions for background tasks
-
Monitor & Handle Blockers
- Use
/bashesto track background tasks - Resume stuck tasks in foreground if needed
-
Apply fallback strategy (retry → resume → recovery task)
-
Integrate & Gate
- Collect all subagent reports
- Resolve merge conflicts
- Run IOSM quality gates
-
Generate
integration_report.mdandiosm_report.md -
Meta-work (ONLY exception to "no implementation")
- Update
plan.mdstatus - Fix metadata (
metadata.json,tracks.md) - Resolve integration conflicts (merge reports)
- Generate final reports/docs
❌ ORCHESTRATOR NEVER DOES:
- Implementation work:
- ❌ Write application code (services, models, API, UI)
- ❌ Write tests (unit, integration, performance)
-
❌ Refactor existing code
-
Analysis work:
- ❌ Explore codebase (that's Explorer's job)
- ❌ Design architecture (that's Architect's job)
-
❌ Run security scans (that's SecurityAuditor's job)
-
Specialized work:
- ❌ Write documentation (that's DocsWriter's job)
- ❌ Debug performance (that's PerfAnalyzer's job)
Exception: If a task is trivial (<5 min) meta-work (e.g., add entry to tracks.md), orchestrator MAY do it. But if it's real logic/code → delegate.
ORCHESTRATION WORKFLOW
Phase 0: Requirements Intake
↓
Phase 1: PRD Generation
↓
Phase 2: Decomposition & Planning (create plan.md)
↓
[NEW] Phase 2.5: Orchestration Planning ← AUTOMATIC
↓
Phase 3: Subagent Execution (CONTINUOUS DISPATCH) ← v1.1
↓
Phase 4: Integration & IOSM Gates
↓
Phase 5: Deployment Prep
CONTINUOUS DISPATCH LOOP (v1.1 — MANDATORY)
Ключевое изменение v1.1: Оркестратор работает в режиме continuous scheduling — как только задача становится READY, она запускается немедленно, без ожидания "конца волны".
Главный принцип
"Работай в режиме continuous scheduling: как только появляется READY задача без конфликтов touches и без needs_user_input — немедленно запускай её в background, даже если другие задачи ещё выполняются. После каждого батча собирай SpawnCandidates из отчётов и автоматически добавляй их в backlog. Продолжай цикл, пока не достигнуты заданные IOSM Gate targets."
Continuous Orchestration Loop
LOOP (до достижения Gate targets):
1. CollectReady()
└─── Собрать задачи, у которых deps выполнены
2. Classify()
└─── Каждой задаче присвоить режим:
- background: safe, no user input needed
- foreground: needs user decision
- blocked_user: needs_user_input=true, не можем авто-решить
- blocked_conflict: touches пересекаются с running
3. ConflictCheck()
└─── Parallel launch ТОЛЬКО tasks без пересечения touches (для write)
└─── Read-only tasks ВСЕГДА можно параллелить
4. DispatchBatch()
└─── Запустить READY tasks ОДНИМ СООБЩЕНИЕМ (max 3-6 per batch)
└─── Приоритет: critical_path > high_severity_spawn > read-only_fillers
└─── Каждый batch получает batch_id для трекинга
└─── Не ждать "конца волны" — dispatch immediately
5. Monitor()
└─── Периодически читать outputs background tasks
└─── Собирать SpawnCandidates из отчётов
6. AutoSpawn()
└─── Если найдены SpawnCandidates → создать новые tasks
└─── Добавить в backlog и вернуться к шагу 1
7. GateCheck()
└─── Проверить условия Gate-I/M/O/S
└─── Если достигнуты → остановиться + gate-report
└─── Если нет → авто-spawn remediation tasks и продолжить
END LOOP
Task States (внутренний трекинг)
| State | Описание |
|---|---|
backlog |
Все известные задачи |
ready |
Deps satisfied, можно запускать |
running |
Выполняется (background или foreground) |
blocked_user |
needs_user_input=true, ждёт решения |
blocked_conflict |
touches заняты другой running task |
done |
Завершена |
Правило: Если задача стала READY в момент, когда другие выполняются — запускать сразу, не ждать checkpoint.
Touches Lock Manager
Для безопасного параллелизма оркестратор должен отслеживать "занятые" файлы:
touches_lock: Set[path] = {}
При запуске task:
1. Проверить: task.touches ∩ touches_lock == ∅ ?
2. Если да → touches_lock.add(task.touches), запустить
3. Если нет → blocked_conflict, ждать освобождения
При завершении task:
1. touches_lock.remove(task.touches)
2. Пересчитать ready_queue (кто разблокировался?)
Правила конфликтов:
- read-only задачи → всегда параллельно (не берут lock)
- write-local → параллельно если touches не пересекаются
- write-shared → строго последовательно
Lock Granularity (v1.1.1)
Иерархия конфликтов:
Lock по ПАПКЕ (core/) конфликтует:
├── с любым lock внутри (core/a.py, core/b.py)
└── с lock на саму папку (core/)
Lock по ФАЙЛУ (core/a.py) конфликтует:
├── только с тем же файлом
└── с lock на родительскую папку (core/)
Нормализация путей:
- Всегда использовать / (forward slash)
- Убирать trailing slash (core/ → core)
- Приводить к lowercase (для Windows)
- Использовать относительные пути от корня проекта
Пример проверки конфликта:
def conflicts(lock_a: str, lock_b: str) -> bool:
a, b = normalize(lock_a), normalize(lock_b)
return a == b or a.startswith(b + '/') or b.startswith(a + '/')
Read-Only Safety Rules
Проблема: "read-only" задачи могут случайно писать в cache, lockfiles, pycache.
Решение: read-only задачи ДОЛЖНЫ:
1. НЕ запускать команды, меняющие файлы (npm install, pip install)
2. Писать временные артефакты ТОЛЬКО в swarm/tracks/<id>/scratch/
3. Использовать флаги --dry-run, --check где возможно
scratch_dir правило:
swarm/tracks/<track-id>/scratch/ ← read-only tasks пишут сюда
├── T00_analysis.json
├── T03_coverage.xml
└── ...
Эта папка НЕ требует lock и НЕ конфликтует ни с кем.
Auto-Background Classification
Оркестратор автоматически классифицирует задачи:
Auto-background (safe, запускать без вопросов):
- Concurrency class = read-only
- Или write-local + needs_user_input=false + no policy conflicts
- effort >= M и нет choice points
Auto-foreground (нужен пользователь):
- Меняется API контракт/формат ответа
- Нужна "истина" (источники, бизнес-логика, астрология)
- Падают тесты и нужно решить "фиксить код или тест"
- High-risk изменения без тестов
- needs_user_input=true
SpawnCandidates Protocol
Каждый субагент ОБЯЗАН писать в отчёте секцию SpawnCandidates:
## SpawnCandidates
При работе обнаружены новые work items:
| ID | Subtask | Touches | Effort | User Input | Severity | Dedup Key | Accept Criteria |
|----|---------|---------|--------|------------|----------|-----------|-----------------|
| SC-01 | Fix missing type annotation in auth.py | `backend/auth.py` | S | false | medium | auth.py|type-annot | mypy passes |
| SC-02 | Clarify API contract for /natal/aspects | `docs/api_spec.yaml` | M | true | high | api_spec|contract | Contract approved |
Dedup Key формат: <primary_touch>|<intent_category>
- Используется для дедупликации одинаковых кандидатов от разных воркеров
Оркестратор обязан:
1. После каждого task completion — читать SpawnCandidates
2. Дедуплицировать по dedup_key (первый wins)
3. Если needs_user_input=false и severity != critical → auto-spawn
4. Если needs_user_input=true → добавить в blocked_user queue
5. Прогнать новые tasks через планнер и dispatch
Spawn Protection (v1.1.1)
Защита от бесконечного размножения задач:
(A) Spawn Budget
В iosm_state.md отслеживать:
## Spawn Budget
- spawn_budget_total: 20
- spawn_budget_used: 7
- spawn_budget_remaining: 13
- spawn_budget_per_gate:
- Gate-I: 5 (used: 2)
- Gate-O: 8 (used: 3)
- Gate-M: 4 (used: 2)
- Gate-S: 3 (used: 0)
Правила:
- При исчерпании budget → STOP, спросить пользователя
- severity=critical игнорирует budget (всегда spawn)
- User может увеличить budget командой
(B) Dedup Rules
def dedup_key(candidate) -> str:
return f"{candidate.touches[0]}|{candidate.intent_category}"
# Оркестратор хранит:
seen_dedup_keys: Set[str] = set()
# При обработке SpawnCandidate:
if candidate.dedup_key in seen_dedup_keys:
skip # дубль
else:
seen_dedup_keys.add(candidate.dedup_key)
process(candidate)
(C) Severity Threshold
| Severity | Auto-spawn условие |
|---|---|
critical |
ВСЕГДА (даже если budget=0), STOP loop и alert |
high |
Если gate fail ИЛИ user запросил |
medium |
Если gate fail И budget > 0 |
low |
Только по явному запросу user |
(D) Anti-Loop Protection
## Anti-Loop Metrics (in iosm_state.md)
- loops_without_progress: 0 # сбрасывается при любом task completion
- max_loops_without_progress: 3
- total_loop_iterations: 15
- max_total_iterations: 50
Правило: Если loops_without_progress >= 3 → STOP, analyze why stuck
Model Selection & Cost (v1.2)
Model Selection Rules:
- haiku: read-only tasks ($0.25/M tokens)
- sonnet: standard tasks, background automation ($3.00/M tokens)
- opus: security audits, critical architecture, user decisions ($15.00/M tokens)
Cost Tracking:
Orchestrator tracks cost in iosm_state.md:
- Estimate: Calculated from Effort field (S=5k, M=20k, L=50k, XL=100k tokens)
- Actual: Sum of tokens reported by subagent (if available) or estimate if not
Budget Control:
- Default limit: $10.00 per track
- Warn @ 80% ($8.00): Notify user
- Stop @ 100% ($10.00): Pause execution, ask user to increase budget or prune tasks
Gate-Driven Continuation
Оркестратор продолжает LOOP пока не достигнуты Gate targets:
Обновлять iosm_state.md после каждого батча:
# IOSM State — [Track ID]
**Updated:** 2026-01-17 15:30
**Status:** IN_PROGRESS
## Gate Targets (from plan.md)
- Gate-I: ≥0.75 (current: 0.68) ❌
- Gate-M: pass (current: pass) ✅
- Gate-O: tests pass (current: 3 failing) ❌
- Gate-S: N/A
## Auto-Spawn Queue
Based on gate gaps, auto-spawning:
- T15: "Improve naming clarity in core/calculator.py" (Gate-I gap)
- T16: "Fix 3 failing integration tests" (Gate-O gap)
## Blocking Questions (needs user)
- Q1: Should we fix test_natal_aspects.py or update expected values?
## Next Actions
Waiting for T15, T16 to complete. Then re-evaluate gates.
Правила продолжения:
- Если Gate-I ниже порога → auto-spawn "Improve clarity / reduce duplication"
- Если Gate-O не pass → auto-spawn "fix failing tests"
- Если Gate-M не pass → auto-spawn "remove circular import / clarify boundaries"
- Продолжать пока gates не достигнуты
Stop Conditions
Оркестратор ОБЯЗАН остановиться и спросить пользователя если:
- Все remaining tasks = needs_user_input=true — нечего делать автономно
- Противоречие — "fix code vs fix tests" без политики
- High-risk — изменение бизнес-логики без источника/эталона
- Scope creep — auto-spawn выходит за рамки PRD
- Critical severity — SpawnCandidate с severity=critical
RETRY WORKFLOW (v1.2)
When user invokes /swarm-iosm retry <task-id>:
1. Load error diagnosis:
from parse_errors import parse_subagent_errors
report_path = Path(f"swarm/tracks/{track_id}/reports/{task_id}.md")
diagnoses = parse_subagent_errors(report_path, task_id)
2. Show diagnosis to user:
Present each error with:
- Error type (e.g., "Permission Denied")
- Affected file
- Root reason
- Suggested fixes (from error diagnosis)
3. User chooses action:
Use AskUserQuestion with options:
- "Apply suggested fix" (if automatic fix available)
- "Manual fix required" (user does it manually)
- "Skip and continue" (mark task as failed)
4. Regenerate brief:
Create new brief with:
- All original brief content
- New "Previous Attempt" section:
## Previous Attempt (Failed)
This task was attempted before and failed with:
**Error:** Permission Denied
**File:** backend/migrations/001.sql
**Reason:** Database user lacks CREATE TABLE permission
**What was attempted:** Direct migration execution
**What to do differently:**
1. Grant permissions first, OR
2. Run as admin user, OR
3. Break into smaller steps
- New "Special Instructions" based on error type
- Error-specific context (files, commands, etc.)
5. Relaunch:
Task(
subagent_type="iosm-engineering-agent",
prompt=updated_brief,
run_in_background=(not "--foreground" in user_command)
)
6. Update state:
- In iosm_state.md, mark task as RETRY_IN_PROGRESS
- Track retry_count in task metadata
- If retry_count >= 3, mark as PERMANENTLY_FAILED
Retry Limits
- Max 3 retries per task
- After 3rd failure: mark as
PERMANENTLY_FAILED - Requires manual intervention to proceed
Error-Specific Retry Strategies
| Error Type | Auto-Fix | Mode | Notes |
|---|---|---|---|
| Permission Denied | No | foreground | User must grant permissions |
| Import Error | Yes (pip install) | background | Try install first |
| Test Failed | No | foreground | User decision: fix code or tests |
| MCP Tool Unavailable | No | foreground | Background can't use MCP |
| File Not Found | Maybe | foreground | Check dependency task |
| Timeout | No | foreground | May need effort increase |
Wave Checkpoints (не барьеры)
Waves остаются для отчётности и checkpoints, но НЕ для blocking:
Wave 1: [T01, T02] — checkpoint для Gate-I review
Wave 2: [T03, T04, T05] — checkpoint для Gate-M review
Wave 3: [T06, T07] — checkpoint для Gate-O review
Но: Если T03 завершился раньше T02, и T04 depends_on T03 — запускать T04 сразу, не ждать Wave 2 checkpoint.
PHASE 2.5: ORCHESTRATION PLANNING (AUTOMATIC)
Goal: Transform plan.md into executable orchestration_plan.md with waves, modes, conflict resolution.
When: After plan.md is created, before launching subagents.
Steps:
- Validate plan.md has required fields:
bash python .claude/skills/swarm-iosm/scripts/orchestration_planner.py swarm/tracks/<id>/plan.md --validate
Check all tasks have:
- Touches (files/folders)
- Needs user input (true/false)
- Effort (S/M/L/XL or minutes)
If missing: Tasks without these fields CANNOT be auto-scheduled. Ask user to add them OR infer from context.
- Generate orchestration plan:
bash python .claude/skills/swarm-iosm/scripts/orchestration_planner.py swarm/tracks/<id>/plan.md --generate
This creates swarm/tracks/<id>/orchestration_plan.md with:
- Dependency graph
- Critical path (longest path through dependencies)
- Execution waves (parallel grouping)
- File conflict matrix
- Background readiness checklist
- Time estimates (serial vs parallel)
- Review with user:
Show orchestration plan summary:
```
Generated orchestration plan: - 5 waves (14 tasks total)
- Wave 1: 1 task (Explorer, background)
- Wave 2: 3 tasks parallel (Architects, foreground)
- Wave 3: 3 tasks parallel (Implementers, background)
- Wave 4: 3 tasks (Tests, background)
- Wave 5: 3 tasks (Integration, mixed)
Estimated time: 27-42h parallel (vs 60-80h serial)
Speedup: ~1.8x
Ready to execute? (yes/no)
```
- Pre-resolve questions for background tasks:
For each task markedneeds_user_input: falsebut you suspect may need decisions: - Use AskUserQuestion NOW (before launching)
- Document answers in subagent brief
Example:
```
Wave 3 has 3 background implementers.
Before launching background tasks, let me clarify:
[AskUserQuestion with 2-3 questions about API design, error handling, testing strategy]
These answers will be included in subagent briefs so they can work autonomously.
```
Output: orchestration_plan.md ready, all questions resolved, ready for Phase 3 execution.
Phase 1: Requirements Intake (Universal)
When user invokes /swarm-iosm new-track or triggers this Skill:
- Determine mode using AskUserQuestion:
- Greenfield (new feature from scratch)
-
Brownfield (modify existing codebase)
-
If Brownfield: Suggest Plan mode first:
"I recommend starting in Plan mode (read-only exploration) to safely analyze the codebase before making changes. Shall I proceed with Plan mode first?" - If yes: Use Task tool with Explore agent to map codebase
-
If no: Proceed with caution warnings
-
Gather requirements using AskUserQuestion for:
- Priority: Speed / Quality / Cost
- Change strictness: Safe (minimal changes) / Normal / Aggressive refactor
- Test strategy: TDD (tests first) / Post-tests / Smoke only
-
Permissions: What tools/operations are allowed
-
Ask text questions for:
- Goal: "What defines 'done' for this task? (1-2 sentences)"
- Context: "Product/users/environment context?"
- Constraints: "Tech stack, versions, deadlines, restrictions?"
- Interfaces: "API/UI/CLI changes needed?"
- Data: "Data sources, migrations, PII concerns?"
- Risks: "What could go wrong?"
-
Definition of Done: "Tests? Docs? Deployment?"
-
Save intake to
swarm/tracks/<track-id>/intake.md
Phase 2: PRD Generation
Using intake data, generate swarm/tracks/<track-id>/PRD.md following template:
# PRD: <Feature Name>
## 1. Problem
## 2. Goals / Non-goals
## 3. Users & Use-cases
## 4. Scope (MVP / Later)
## 5. Requirements
### Functional
### Non-functional
## 6. UX / API / Data
## 7. Risks & Mitigations
## 8. Acceptance Criteria
## 9. Rollout / Migration plan
## 10. IOSM Targets (Gates + expected index delta)
See templates/prd.md for detailed template.
Phase 3: Decomposition & Planning
From PRD, create spec.md and plan.md:
spec.md (Conductor-style):
- Context
- What / Why
- Constraints
- Out of scope
- Acceptance tests
- Artifacts to produce
- Rollback assumptions
plan.md (WBS with dependencies):
- Phases (0: Intake, 1: Design, 2: Implementation, 3: Verification, 4: Integration)
- Tasks with:
- owner_role (Explorer/Architect/Implementer/TestRunner/etc)
- depends_on (task IDs)
- files_modules (scope)
- acceptance criteria
- artifacts (reports/T01.md, etc)
- iosm_checks (which gates apply)
- status (TODO/DOING/DONE/BLOCKED)
See templates/plan.md for structure.
Phase 3: Subagent Execution
Goal: Execute orchestration_plan.md using parallel waves of subagents.
CRITICAL: Launch subagents in PARALLEL WAVES, not one-by-one.
Standardized Subagent Roles
Use these predefined roles:
- Explorer (brownfield analysis)
- Tools: Read, Grep, Glob
- Output: Architecture map, dependencies, test coverage, code style
-
When: Always for brownfield, before making changes
-
Architect (design decisions)
- Tools: Read, Write (ADRs)
- Output: ADR documents, interface contracts, API specs
-
When: Complex features, API changes, architectural decisions
-
Implementer-{A,B,C} (parallel implementation)
- Tools: Read, Write, Edit, Bash (tests)
- Output: Code changes, unit tests, implementation report
-
When: Independent modules that can be developed in parallel
-
TestRunner (verification)
- Tools: Read, Bash, Write
- Output: Test results, coverage report, failure analysis
-
When: After implementation, before integration
-
SecurityAuditor (security review)
- Tools: Read, Grep, Bash (security scanners)
- Output: Security findings, remediation suggestions
-
When: Auth/payment features, external APIs, data handling
-
PerfAnalyzer (performance review)
- Tools: Read, Bash (profiling)
- Output: Performance metrics, bottleneck analysis
-
When: Data processing, APIs, high-traffic features
-
DocsWriter (documentation)
- Tools: Read, Write, Edit
- Output: README updates, API docs, user guides
- When: Public APIs, complex features, user-facing changes
Parallelization Rules:
✅ Parallel (can run simultaneously):
- Different modules/files with no shared state
- Independent research tasks (Explorer on different subsystems)
- Docs + Implementation (if API is stable)
- Multiple Implementers on separate components
❌ Sequential (must run in order):
- Tasks with dependencies (Architect → Implementer)
- Shared file modifications (two agents editing same file)
- Test → Fix → Re-test cycles
Background vs Foreground:
Use background (run_in_background: true in Task tool) when:
- Long-running operations (tests, builds, analysis)
- No user input needed (all questions resolved upfront)
- Permissions pre-approved
- Can tolerate "fire and forget" mode
Use foreground (default) when:
- Need user clarifications during execution
- Interactive debugging/problem-solving
- Permission escalations expected
- Results needed immediately for next step
IMPORTANT: Background subagents cannot use AskUserQuestion (tool call will fail). Resolve all questions BEFORE launching background tasks.
Background Limitations (CRITICAL)
Background subagents CANNOT reliably use:
| Tool/Feature | Status | Reason |
|---|---|---|
AskUserQuestion |
BLOCKED | Auto-denied, no user interaction |
| Permission prompts | BLOCKED | Auto-denied, may fail silently |
| MCP tools | UNSTABLE | May be unavailable in background context |
| External APIs | RISKY | Network errors not recoverable |
| Long git operations | RISKY | May timeout or conflict |
Rule of thumb:
- Background = autonomous code/tests/read/local-only operations
- Foreground = MCP, external integrations, user decisions, risky operations
Pre-flight checklist for background tasks:
1. All questions pre-resolved in brief
2. No MCP tools required
3. No external API calls (or wrap with fallback)
4. No interactive permissions needed
5. Touches clearly defined (no surprises)
If task needs MCP or external calls → force foreground:
- **Needs user input:** true ← even if technically "safe"
- **Note:** Requires MCP/external API, must run foreground
Step 1: Load Orchestration Plan
Read swarm/tracks/<id>/orchestration_plan.md to understand:
- How many waves
- Which tasks in each wave
- Which tasks are parallel vs sequential
- Which tasks are background vs foreground
Step 2: Execute Waves (ONE WAVE AT A TIME)
For each wave in the orchestration plan:
A. Prepare Subagent Briefs
For each task in the wave:
1. Generate brief using templates/subagent_brief.md
2. Fill in all sections:
- Goal, Scope, Context
- Dependencies (what previous tasks delivered)
- Constraints (technical, performance, security)
- Output contract (code + tests + report)
- Verification steps
- Acceptance criteria
- Pre-resolved questions (for background tasks)
- IOSM checks to pass
- Include report template requirement:
You MUST save report to: swarm/tracks/<id>/reports/<task-id>.md Use template: .claude/skills/swarm-iosm/templates/subagent_report.md
B. Launch Wave (CRITICAL: PARALLEL IN SINGLE MESSAGE)
For parallel tasks in wave:
Launch ALL tasks in wave SIMULTANEOUSLY using single message with multiple Task tool calls.
Example (Wave 3: 3 implementers):
I'm launching Wave 3 with 3 parallel implementers (all background):
[Single message with 3 Task tool calls]
Task 1 (T04 - Implementer-A):
- subagent_type: general-purpose
- description: Implement core business logic
- prompt: [Full brief for T04]
- run_in_background: true
Task 2 (T05 - Implementer-B):
- subagent_type: general-purpose
- description: Implement API endpoints
- prompt: [Full brief for T05]
- run_in_background: true
Task 3 (T06 - Implementer-C):
- subagent_type: general-purpose
- description: Implement data access layer
- prompt: [Full brief for T06]
- run_in_background: true
Monitoring: Use /bashes to track progress
Expected completion: 8-12 hours
NEVER launch tasks one-by-one if they can run parallel. ALWAYS use single message.
C. Monitor Progress
While wave is running:
-
Check background tasks periodically:
/bashes -
Check task output files (if provided):
bash tail -n 50 /path/to/task/output/file -
If task completes:
- Verify report exists:
swarm/tracks/<id>/reports/T##.md - Check acceptance criteria met
-
Mark status in
plan.md:Status: DONE -
If task blocks/fails:
- Apply fallback strategy (see below)
D. Fallback Strategy (if subagent fails)
Scenario 1: Transient error (timeout, network)
- Action: Retry once automatically
- Command: Re-launch same brief
Scenario 2: Permission/question blocker
- Action: Resume in foreground
- How: Use TaskOutput to get task_id, then Task tool with resume parameter
- Example:
Task blocked on permission for "run database migrations"
→ Resume in foreground, approve permission, continue
Scenario 3: Logic gap (unclear contract/spec)
- Action: Create recovery task
- Steps:
1. Create new task for Architect: "Clarify [missing requirement]"
2. Run Architect task (foreground)
3. Update brief for blocked task
4. Re-launch subagent
Scenario 4: Unrecoverable failure
- Action: Mark BLOCKED and continue
- Steps:
1. Update plan.md: Status: BLOCKED(reason: ...)
2. Save partial work in reports/T##-partial.md
3. Add to integration report: "T## blocked, manual resolution needed"
4. Continue with other waves (don't block entire workflow)
Step 3: Wave Completion Check
Before proceeding to next wave:
- [ ] All tasks in wave completed OR marked BLOCKED
- [ ] All reports saved to
reports/ - [ ] No merge conflicts detected (if parallel edits)
- [ ] All acceptance criteria met (or exceptions documented)
If wave has blockers:
- Document in orchestration_plan.md (update Progress section)
- Decide: resolve now OR defer to integration phase
Step 4: Proceed to Next Wave
Repeat Step 2 for next wave.
Important:
- Respect dependencies: Wave N can only start when all Wave N-1 tasks are DONE or BLOCKED
- Update orchestration_plan.md with actual completion times (for future estimation)
Step 5: All Waves Complete
When all waves finished:
- Update plan.md: Status: Integration
- Proceed to Phase 4 (Integration & IOSM Gates)
PARALLEL LAUNCH EXAMPLES
Example 1: Wave 2 (3 foreground tasks)
Launching Wave 2 (Design phase) with 3 tasks:
[Single message with 3 Task calls, all foreground]
These tasks will run interactively (you'll see their prompts).
Expected: ~4-6 hours for slowest task (T01)
Example 2: Wave 3 (3 background tasks)
Launching Wave 3 (Implementation) with 3 background tasks:
[Single message with 3 Task calls, all run_in_background: true]
Monitor with: /bashes
Check outputs in: swarm/tracks/2026-01-17-001/reports/
Example 3: Mixed wave (2 parallel + 1 sequential)
Wave 4a: Launching 2 parallel tasks (T08, T10):
[Single message with 2 Task calls, background]
When T08 completes, I'll launch Wave 4b (T09 depends on T08).
Phase 4: Integration & IOSM Gates
After subagents complete:
- Read all reports from
swarm/tracks/<id>/reports/ - Validate each report has required sections (see templates/subagent_report.md)
- Identify conflicts:
- File modification overlaps
- Contradictory decisions
- Dependency mismatches
- Generate integration_report.md with:
- What changed (by task)
- Conflict resolutions
- Merge order (respecting dependencies)
- Final verification checklist
- Rollback guide
See templates/integration_report.md.
IOSM Quality Gates Evaluation
After integration_report.md is complete, run IOSM gates on integrated result:
Gate-I (Improve):
- Semantic clarity ≥0.95 (clear naming, no magic numbers)
- Code duplication ≤5%
- Invariants documented
- All TODOs tracked
Gate-O (Optimize):
- P50/P95/P99 latency measured
- Error budget defined
- Basic chaos/resilience tests passing
- No obvious N+1 queries or memory leaks
Gate-S (Shrink):
- API surface reduced ≥20% (or justified growth)
- Dependency count stable or reduced
- Onboarding time ≤15min for new contributor
Gate-M (Modularize):
- Clear module contracts
- Change surface ≤20% (localized impact)
- Coupling/cohesion metrics acceptable
- No circular dependencies
Calculate IOSM-Index:
IOSM-Index = (Gate-I + Gate-O + Gate-S + Gate-M) / 4
Target: ≥0.80 for production merge.
Generate swarm/tracks/<id>/iosm_report.md with gate results.
See templates/iosm_gates.md for detailed criteria.
File Structure
The Skill creates this structure:
.claude/skills/swarm-iosm/ # Skill definition
SKILL.md # This file
templates/ # Progressive disclosure templates
scripts/ # Validation/analysis scripts
swarm/ # Project workflow data
context/ # Project-wide context
product.md
tech-stack.md
workflow.md
tracks/ # Feature/task tracks
<YYYY-MM-DD-NNN>/ # Track directory
intake.md # Requirements intake
PRD.md # Product requirements
spec.md # Technical spec
plan.md # Implementation plan
metadata.json # Track metadata
reports/ # Subagent reports
T01.md
T02.md
...
integration_report.md # Integration plan
iosm_report.md # Quality gate results
rollback_guide.md # Revert instructions (if needed)
tracks.md # Track registry/index
Best Practices
- Always resolve questions upfront - Background subagents can't ask questions
- Use Plan mode for brownfield - Safe exploration before changes
- Parallelize research, sequence implementation - Avoid file conflicts
- Demand structured reports - Traceability and integration depend on it
- Run IOSM gates before merge - Quality enforcement
- Create rollback plans - Safety net for production changes
- Use TodoWrite - Track overall Swarm workflow progress
- Monitor background tasks - Use
/bashescommand
Common Patterns
Pattern 1: Greenfield Feature
/swarm-iosm new-track "Add email notification system"
→ Intake (quick, no repo analysis)
→ PRD + Plan generation
→ Parallel: Architect (API design) + DocsWriter (email templates)
→ Sequential: Implementer (core) → TestRunner → Integration
Pattern 2: Brownfield Refactor
/swarm-iosm setup
/swarm-iosm new-track "Refactor payment processing"
→ Plan mode: Explorer analyzes payment module
→ Architect creates migration plan
→ Parallel: Implementer-A (new code) + TestRunner (regression tests)
→ Integration with rollback guide
Pattern 3: Large Feature with Many Tasks
/swarm-iosm new-track "Multi-tenant architecture"
→ Generate plan with 15+ tasks
→ Phase 1: Sequential design (Architect → review)
→ Phase 2: Parallel implementation (3x Implementer background)
→ Phase 3: Sequential integration (merge → test → gates)
Troubleshooting
Background subagent fails with permission error:
- Resume in foreground: Find task in /bashes, get task ID, resume
- Pre-approve permissions: Use AskUserQuestion before launching
Reports missing or incomplete:
- Subagent brief must explicitly require report template
- Validate reports using scripts/summarize_reports.py
File conflicts during integration:
- Plan should minimize shared file edits
- Use git branches per subagent (advanced)
- Integration report must resolve conflicts manually
IOSM gates failing:
- Review gate criteria in templates/iosm_gates.md
- Some gates may be aspirational (document exceptions)
- Iterate: fail → fix → re-check
Advanced Usage
See additional documentation:
- templates/ - All templates with detailed examples
- scripts/ - Helper scripts for validation and analysis
Dependencies
- Claude Code with Task tool support
- Git (for version control and rollback)
- Project-specific: Python/Node/etc for running tests
Version
Swarm Workflow (IOSM) v2.1 - 2026-01-19
v2.1 Changes:
- Automated State Management (auto-generated iosm_state.md)
- Status Sync CLI (--update-task)
- Improved Report Conflict Detection
v2.0 Changes:
- Inter-Agent Communication (Shared Context)
- Task Dependency Visualization (--graph)
- Anti-Pattern Detection
- Template Customization
v1.3 Changes:
- Simulation Mode (/swarm-iosm simulate) with ASCII Timeline
- Live Monitoring (/swarm-iosm watch)
- Checkpointing & Resume (/swarm-iosm resume)
v1.2 Changes:
- Concurrency Limits (Resource Budgets)
- Cost Tracking & Model Selection (Haiku/Sonnet/Opus)
- Intelligent Error Diagnosis & Retry (/swarm-iosm retry)
v1.1 Changes:
- Continuous Dispatch Loop (не ждём волну — запускаем сразу при READY)
- Gate-driven continuation (работаем до достижения Gate targets)
- Auto-spawn из SpawnCandidates в отчётах
- Touches lock manager (конфликты файлов)
- iosm_state.md для трекинга прогресса к гейтам
v1.1.1 Changes:
- Lock Granularity (folder vs file hierarchy, path normalization)
- Read-Only Safety Rules (scratch_dir для артефактов)
- Spawn Protection (budget, dedup keys, severity threshold)
- Anti-Loop Protection (max iterations, progress tracking)
- Batch Constraints (max 3-6 per batch, priority ordering, batch_id)
- Touched Actual tracking (plan vs actual diff, unplanned touches alert)
- Operational Runbook в QUICKSTART.md
# README.md
Swarm-IOSM
Parallel Orchestration Engine for Claude Code with Built-in Quality Gates
Features • Quick Start • Architecture • Documentation • Use Cases • Contributing
🎯 What is Swarm-IOSM?
Swarm-IOSM is an advanced orchestration engine for Claude Code that transforms complex development tasks into coordinated parallel work streams with enforced quality standards.
It implements the IOSM methodology (Improve → Optimize → Shrink → Modularize) as an executable system for parallel AI agent coordination, combining:
- 🤖 Intelligent Orchestration — Continuous dispatch scheduling with dependency analysis
- 🔒 File Conflict Detection — Lock management prevents parallel write conflicts
- 📋 PRD-Driven Planning — Structured requirements → decomposition → execution
- ✅ IOSM Quality Gates — Automated code quality, performance, and modularity checks
- 🔄 Auto-Spawn Protocol — Dynamic task discovery and creation during execution
- 📊 Cost Tracking — Budget guardrails with usage monitoring
Core Model:
Touches → Locks → Gates → DoneA correctness model for parallel agent work: declare what files you touch, acquire locks to prevent conflicts, pass quality gates, ship.
Why Swarm-IOSM?
Traditional development workflows struggle with:
- Sequential bottlenecks — One task blocks the next, wasting time
- Context loss — Large features lack structured documentation
- Quality debt — No systematic enforcement of engineering standards
- Manual coordination — Developers spend time orchestrating instead of building
Swarm-IOSM solves these by:
- Parallelizing independent work streams (commonly 3–8x faster than sequential, depends on task independence)
- Enforcing IOSM quality gates before merge
- Automating task decomposition and subagent coordination
- Tracking all decisions and artifacts for full traceability
What Swarm-IOSM is NOT
To set clear expectations:
- ❌ Not a general-purpose workflow engine — Designed specifically for Claude Code agent orchestration
- ❌ Not a replacement for CI/CD — Complements your pipeline, doesn't replace Jenkins/GitHub Actions
- ❌ Not a code generator "autopilot" — Requires human oversight and decision-making
- ❌ Not safe to run unattended on production repos — Always review changes before merge
⚡ 60-Second Demo
# Install
git clone https://github.com/rokoss21/swarm-iosm.git .claude/skills/swarm-iosm
# In Claude Code
/swarm-iosm setup
/swarm-iosm new-track "Add JWT authentication"
What you get:
- swarm/tracks/<id>/PRD.md — Requirements document
- swarm/tracks/<id>/plan.md — Task breakdown with dependencies
- swarm/tracks/<id>/reports/ — Subagent execution reports (after /swarm-iosm implement)
- swarm/tracks/<id>/integration_report.md — Merge plan & results
- swarm/tracks/<id>/iosm_report.md — Quality gate evaluation
See complete example: examples/demo-track/ — Full track from PRD to merge (7 tasks, Redis caching feature)
🌟 Key Features
Core Capabilities
| Feature | Description | Benefits |
|---|---|---|
| Continuous Dispatch Loop | Tasks launch immediately when dependencies are met | No artificial wave barriers, maximum parallelism |
| Parallel Subagent Execution | Up to 8 simultaneous background/foreground agents | Often 3-8x faster than sequential execution |
| IOSM Quality Gates | Automated checks for code quality, performance, complexity | Quality-gated before merge |
| File Lock Management | Hierarchical conflict detection (file/folder) | Safe parallel writes, prevents merge conflicts |
| Auto-Spawn from Discoveries | Subagents report new work → orchestrator schedules | Self-organizing workflow adaptation |
| Intelligent Error Recovery | Pattern-based diagnosis with suggested fixes | Auto-diagnosis with 3 retry limit |
| Cost & Budget Control | Token usage tracking with budget guardrails | Predictable API costs (default: $10 limit) |
| Checkpoint & Resume | Crash recovery from last known state | Fault-tolerant long-running tasks |
Feature Status
| Feature | Status | Command/Location |
|---|---|---|
| ✅ Inter-Agent Communication | Available in v2.0+ | shared_context.md auto-updated |
| ✅ Task Dependency Visualization | Available in v2.0+ | --graph flag in orchestration planner |
| ✅ Anti-Pattern Detection | Available in v2.0+ | Auto-warns during planning |
| ✅ Template Customization | Available in v2.0+ | Override in swarm/templates/ |
| ✅ Simulation Mode | Available in v1.3+ | /swarm-iosm simulate |
| ✅ Checkpoint & Resume | Available in v1.3+ | /swarm-iosm resume |
| 🧪 Live Monitoring | Experimental | /swarm-iosm watch (basic implementation) |
| 🗺️ IDE Integration | Roadmap | VS Code extension planned |
| 🗺️ CI/CD Templates | Roadmap | GitHub Actions / GitLab CI examples |
🏗️ Architecture
System Overview
┌──────────────────────────────────────────────────────────────────────┐
│ ORCHESTRATOR (Main Claude Agent) │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ Continuous Dispatch Loop (v1.1+) │ │
│ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────────────┐ │ │
│ │ │ Collect │→ │ Classify │→ │ Conflict │→ │ Dispatch Batch │ │ │
│ │ │ Ready │ │ Modes │ │ Check │ │ (max 3-6 tasks) │ │ │
│ │ └──────────┘ └──────────┘ └──────────┘ └──────────────────┘ │ │
│ │ ↑ │ │ │
│ │ │ ┌──────────┐ ┌──────────┐ ↓ │ │
│ │ └────────│ IOSM │←─│ Auto- │←────────┘ │ │
│ │ │ Gates │ │ Spawn │ │ │
│ │ └──────────┘ └──────────┘ │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ┌───────────────────┼───────────────────┐ │
│ ↓ ↓ ↓ │
│ ┌────────────────────┐ ┌────────────────────┐ ┌─────────────────┐ │
│ │ Subagent (BG) │ │ Subagent (BG) │ │ Subagent (FG) │ │
│ │ Explorer │ │ Implementer-A │ │ Architect │ │
│ │ read-only │ │ write-local │ │ needs_user │ │
│ └────────────────────┘ └────────────────────┘ └─────────────────┘ │
│ │ │ │ │
│ ↓ ↓ ↓ │
│ reports/T01.md reports/T02.md reports/T03.md │
│ + SpawnCandidates + SpawnCandidates + Escalations │
└──────────────────────────────────────────────────────────────────────┘
IOSM Framework Integration
┌────────────────────────────────────────────────────────────────────────────┐
│ IOSM FRAMEWORK │
│ https://github.com/rokoss21/IOSM │
├────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────────────┐ │
│ │ IMPROVE │ → │ OPTIMIZE │ → │ SHRINK │ → │ MODULARIZE │ │
│ │ │ │ │ │ │ │ │ │
│ │ Clarity │ │ Speed │ │ Simplify │ │ Decompose │ │
│ │ No dups │ │ Resil. │ │ Surface │ │ Contracts │ │
│ │ Invars │ │ Chaos │ │ Deps │ │ Coupling │ │
│ └────┬─────┘ └────┬─────┘ └────┬─────┘ └────────┬─────────┘ │
│ │ │ │ │ │
│ ┌────▼─────┐ ┌────▼─────┐ ┌────▼─────┐ ┌────────▼─────────┐ │
│ │ Gate-I │ │ Gate-O │ │ Gate-S │ │ Gate-M │ │
│ │ ≥0.85 │ │ ≥0.75 │ │ ≥0.80 │ │ ≥0.80 │ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────────────┘ │
│ │
│ IOSM-Index = (Gate-I + Gate-O + Gate-S + Gate-M) / 4 │
│ Production threshold: ≥ 0.80 │
└────────────────────────────────────────────────────────────────────────────┘
Task State Machine
┌──────────┐
│ backlog │ All known tasks
└────┬─────┘
│ dependencies satisfied
↓
┌──────────┐
│ ready │ Eligible for dispatch
└────┬─────┘
│ no file conflicts
├─────────────────┬─────────────────┐
↓ ↓ ↓
┌──────────┐ ┌──────────────┐ ┌──────────────────┐
│ running │ │ blocked_user │ │ blocked_conflict │
│(BG or FG)│ │needs decision│ │ file lock held │
└────┬─────┘ └──────────────┘ └──────────────────┘
│ completes │ lock released
↓ ↓
┌──────────┐ ┌──────────┐
│ done │←────────────────────────│ ready │
└──────────┘ └──────────┘
│ spawn candidates
↓
┌──────────┐
│ backlog │ (auto-spawned tasks)
└──────────┘
🚀 Quick Start
See the 60-Second Demo above for immediate hands-on, or follow the complete guide:
📖 Full Tutorial: QUICKSTART.md
Key Commands:
/swarm-iosm setup # Initialize project
/swarm-iosm new-track "..." # Create feature track
/swarm-iosm implement # Execute plan
/swarm-iosm integrate <id> # Merge & run quality gates
Need help? See TROUBLESHOOTING.md for common issues.
📚 Documentation
Core Documentation
| Document | Purpose | Audience |
|---|---|---|
| SKILL.md | Complete specification (1330+ lines) | Advanced users, contributors |
| QUICKSTART.md | 5-minute intro with examples | First-time users |
| RUNBOOK.md | Manual orchestration operations | Power users |
| VALIDATION.md | Installation & config checklist | DevOps, QA |
| TROUBLESHOOTING.md | Common issues & solutions | All users |
Templates (Progressive Disclosure)
Located in templates/:
- prd.md — Product Requirements Document (10 sections)
- plan.md — Implementation plan with dependencies
- subagent_brief.md — Task instructions for subagents
- subagent_report.md — Structured output format
- iosm_gates.md — Quality gate criteria & scoring
- iosm_state.md — Live execution state tracker
- integration_report.md — Merge plan & conflict resolution
- shared_context.md — Inter-agent communication
- intake_questions.md — Requirements gathering
Scripts (Automation)
Located in scripts/:
- orchestration_planner.py — Generate dispatch plan from plan.md
- validate_plan.py — Check plan structure & dependencies
- summarize_reports.py — Aggregate subagent outputs
- merge_context.py — Update shared context from reports
- parse_errors.py — Error diagnosis & fix suggestions
- error_patterns.py — Known error patterns library
- errors.py — Error handling utilities
💡 Use Cases
1. Greenfield Feature Development
Scenario: Add complete email notification system to SaaS app
Workflow:
/swarm-iosm new-track "Add email notification system"
→ Intake (mode: greenfield, priority: quality)
→ PRD generation (15 min)
→ Decomposition:
- T01: Design email templates (Architect, foreground)
- T02: Implement SMTP service (Implementer-A, background)
- T03: Add queue system (Implementer-B, background, parallel with T02)
- T04: Write integration tests (TestRunner, background, after T02+T03)
- T05: Add API endpoints (Implementer-C, background, after T02)
→ Execute (4-6 hours parallel, vs 12-15h serial)
→ IOSM gates: All pass (Gate-I: 0.92, Gate-O: 0.88, Gate-S: 0.85, Gate-M: 0.90)
→ Deploy with confidence
Results:
- ⚡ ~3x faster (4-6h parallel vs 12-15h sequential)
- ✅ 100% test coverage (Gate-O enforcement)
- 📉 Minimal technical debt (Gate-I: 0.92 clarity score)
- 🔄 Full rollback plan auto-generated
2. Brownfield Refactoring
Scenario: Refactor legacy payment processing module (5000+ LOC, 3 years old)
Workflow:
/swarm-iosm new-track "Refactor payment processing"
→ Plan mode exploration (T00: Explorer analyzes codebase)
→ PRD with rollback strategy
→ Decomposition:
- T01: Map existing payment flows (Explorer, background, read-only)
- T02: Design new module boundaries (Architect, foreground)
- T03: Write comprehensive regression tests (TestRunner, background, after T01)
- T04: Implement new PaymentService (Implementer-A, background, after T02+T03)
- T05: Migrate first payment method (Implementer-B, background, after T04)
- T06: Security audit (SecurityAuditor, foreground, after T05)
- T07: Performance benchmark (PerfAnalyzer, background, after T05)
→ Gate-M fails (circular dependency detected)
→ Auto-spawn: T08 "Break circular import between Payment and Invoice"
→ Re-check Gate-M: Pass
→ Integrate with rollback guide
Results:
- 🎯 Gate-driven quality — Forced resolution of hidden issues
- 🔒 Safe refactor — All tests passing before merge
- 📊 Measured improvement — 40% reduction in module coupling
- 🗺️ Clear rollback path — Database + code revert instructions
3. Multi-Module Feature with Dependencies
Scenario: Add multi-tenant architecture (affects 8 modules)
Workflow:
/swarm-iosm new-track "Multi-tenant architecture"
→ PRD: 20+ tasks identified
→ Orchestration plan:
- Wave 1: T01 Design schema (Architect, foreground, critical path)
- Wave 2: T02-T04 Database migration scripts (Implementer-A,B,C, parallel, after T01)
- Wave 3: T05-T10 Update 6 modules (6 Implementers, parallel, after Wave 2)
- Wave 4: T11-T15 Tests (5 TestRunners, parallel, after Wave 3)
- Wave 5: T16 Integration (Integrator, foreground, after Wave 4)
→ Execute with continuous dispatch (no wave barriers)
→ T05 spawns SC-01: "Add tenant_id index to sessions table" (auto-spawn)
→ Cost tracking: $6.50 / $10.00 budget used
→ IOSM Index: 0.82 (above threshold)
Results:
- 📈 High parallelism — 6 modules updated simultaneously
- 💰 Budget control — $6.50 spent (within $10 limit)
- 🔍 Auto-discovery — 3 critical tasks auto-spawned from findings
- ⏱️ Time savings — ~18h parallel vs 60h+ sequential (example track)
🏆 IOSM Quality Gates
Each track enforces 4 quality gates before merge:
Gate-I: Improve (Code Quality)
semantic_coherence: ≥0.95 # Clear naming, no magic numbers
duplication_max: ≤0.05 # Max 5% duplicate code
invariants_documented: true # Pre/post-conditions
todos_tracked: true # All TODOs in issue tracker
Measured by:
- AST analysis (identifiers, literals)
- Clone detection (structural similarity)
- Docstring coverage
Gate-O: Optimize (Performance & Resilience)
latency_ms:
p50: ≤100
p95: ≤200
p99: ≤500
error_budget_respected: true
chaos_tests_pass: true
no_obvious_inefficiencies: true # N+1 queries, memory leaks
Measured by:
- Load testing (locust, k6)
- Chaos engineering (kill processes, network faults)
- Profiling (py-spy, perf)
Gate-S: Shrink (Minimal Complexity)
api_surface_reduction: ≥0.20 # Or justified growth
dependency_count_stable: true
onboarding_time_minutes: ≤15
Measured by:
- Public API endpoint/function count
- requirements.txt / package.json diff
- README clarity test
Gate-M: Modularize (Clean Boundaries)
contracts_defined: 1.0 # 100% of modules
change_surface_max: 0.20 # ≤20% of codebase touched
no_circular_deps: true
coupling_acceptable: true
Measured by:
- Dependency graph analysis
- Interface stability metrics
- Import cycle detection
IOSM-Index Calculation
IOSM-Index = (Gate-I + Gate-O + Gate-S + Gate-M) / 4
Production Threshold: ≥ 0.80
Auto-spawn rules:
- If Gate-I < 0.75 → Spawn clarity/duplication fixes
- If Gate-O fails → Spawn test/performance fixes
- If Gate-M fails → Spawn boundary clarification tasks
🛠️ Commands Reference
| Command | Description | Mode |
|---|---|---|
/swarm-iosm setup |
Initialize project context | Auto |
/swarm-iosm new-track "<desc>" |
Create feature track with PRD | Auto |
/swarm-iosm implement [track-id] |
Execute implementation plan | Auto |
/swarm-iosm status [track-id] |
Check progress & errors | Read-only |
/swarm-iosm watch [track-id] |
Live monitoring dashboard (v1.3) | Read-only |
/swarm-iosm simulate [track-id] |
Dry-run with timeline (v1.3) | Read-only |
/swarm-iosm resume [track-id] |
Resume from checkpoint (v1.3) | Auto |
/swarm-iosm retry <task-id> [opts] |
Retry failed task (v1.2) | Auto |
/swarm-iosm integrate <track-id> |
Merge work + run IOSM gates | Auto |
/swarm-iosm revert-plan <track-id> |
Generate rollback guide | Read-only |
Retry Options:
- --foreground — Run interactively for debugging
- --reset-brief — Regenerate task brief from scratch
🧩 Subagent Roles
Standard Roles
| Role | Purpose | Concurrency | Tools | When to Use |
|---|---|---|---|---|
| Explorer | Codebase analysis, IOSM baseline | read-only |
Read, Grep, Glob | Brownfield projects, initial assessment |
| Architect | Design decisions, API contracts | write-local |
Read, Write (docs) | Complex features, architectural changes |
| Implementer-{A,B,C} | Parallel implementation | write-local |
Read, Write, Edit, Bash | Independent modules |
| TestRunner | Gate-O verification | read-only |
Read, Bash | After implementation, before merge |
| SecurityAuditor | Gate-I security invariants | read-only |
Read, Grep, Bash | Auth, payments, PII handling |
| PerfAnalyzer | Gate-O performance | read-only |
Read, Bash (profiling) | High-traffic features, data processing |
| DocsWriter | Gate-S onboarding | write-local |
Read, Write, Edit | Public APIs, user-facing features |
Concurrency Classes
| Class | Lock Behavior | Parallel Execution | Example |
|---|---|---|---|
read-only |
No lock | Always parallel | Code analysis, tests |
write-local |
Lock on touches |
Parallel if no overlap | Module implementation |
write-shared |
Exclusive lock | Sequential only | Database migrations |
📊 Cost Tracking & Budgets
Model Selection (v1.2)
Swarm-IOSM automatically selects the optimal model:
| Model | Use Case | Cost (input/output per 1M tokens) |
|---|---|---|
| Haiku | Read-only analysis, simple tasks | $0.25 / $1.25 |
| Sonnet | Standard implementation, tests | $3.00 / $15.00 |
| Opus | Architecture, security, critical decisions | $15.00 / $75.00 |
Budget Controls
Default limits:
- max_parallel_background: 6
- max_parallel_foreground: 2
- max_total_parallel: 8
- cost_limit_per_track: $10.00
Budget alerts:
- ⚠️ 80% usage → Warning notification
- 🛑 100% usage → Pause execution, await user decision
Check current spend:
cat swarm/tracks/<id>/iosm_state.md | grep -A5 "Cost Tracking"
🔄 Continuous Dispatch Loop (v1.1+)
Key Innovation: No Wave Barriers
Traditional orchestration waits for entire "waves" to complete. Swarm-IOSM dispatches tasks immediately when dependencies are satisfied.
Before (Wave-based):
Wave 1: [T01, T02, T03] → Wait for ALL to finish
Wave 2: [T04, T05] → Can't start until Wave 1 done
After (Continuous Dispatch):
T01 done → T04 starts immediately (even if T02, T03 still running)
Dispatch Algorithm
while not gates_met:
# 1. Collect ready tasks (deps satisfied, no conflicts)
ready = [t for t in backlog if deps_satisfied(t) and not conflicts(t)]
# 2. Classify by mode (background vs foreground)
bg = [t for t in ready if can_auto_background(t)]
fg = [t for t in ready if needs_user_input(t)]
# 3. Dispatch batch (max 3-6 tasks)
launch_parallel(bg[:6], mode='background')
launch_parallel(fg[:2], mode='foreground')
# 4. Monitor & spawn
for report in collect_completed():
spawn_candidates = parse_spawn_candidates(report)
backlog.extend(deduplicate(spawn_candidates))
# 5. Check gates
if all_gates_pass():
break
🔐 File Lock Management
Hierarchical Conflict Detection
Lock Granularity:
Lock on FOLDER (core/) conflicts with:
├── Any lock inside (core/a.py, core/b.py)
└── Lock on same folder (core/)
Lock on FILE (core/a.py) conflicts with:
├── Same file only
└── Parent folder lock (core/)
Conflict Matrix Example:
## Lock Plan
Tasks with overlapping touches (sequential only):
- `backend/core/__init__.py`: T03, T04 → ❌ Cannot run parallel
- `backend/api/`: T05, T06 → ❌ Folder conflict
Safe parallel execution:
- `backend/auth.py` (T02) + `backend/payments.py` (T07) → ✅ No overlap
Read-Only Safety Rules
Problem: Read-only tasks may accidentally write to caches, lockfiles, __pycache__.
Solution:
1. Read-only tasks write temp files ONLY to swarm/tracks/<id>/scratch/
2. Use --dry-run flags where available
3. Never run npm install, pip install in read-only mode
🚨 Error Recovery (v1.2)
Intelligent Error Diagnosis
When a task fails, Swarm-IOSM provides:
- Error type (e.g., Permission Denied, Import Error)
- Affected file with line number
- Root cause analysis
- 2-4 suggested fixes ranked by likelihood
- Retry command with appropriate flags
Example:
❌ T04 Failed: Permission Denied
File: backend/migrations/001.sql
Cause: Database user lacks CREATE TABLE privilege
Suggested fixes:
1. GRANT CREATE ON DATABASE app TO user; (High confidence)
2. Run migration as admin: sudo -u postgres psql (Medium)
3. Split into smaller migrations (Low)
Retry: /swarm-iosm retry T04 --foreground
Error-Specific Retry Strategies
| Error Type | Auto-Fix | Mode | Max Retries |
|---|---|---|---|
| Permission Denied | No | Foreground | 3 |
| Import Error | Yes (pip install) | Background | 3 |
| Test Failed | No | Foreground | 3 |
| MCP Tool Unavailable | No | Foreground | 1 |
| File Not Found | Maybe | Foreground | 3 |
| Timeout | No | Foreground | 2 |
Retry workflow:
# Standard retry
/swarm-iosm retry T04
# Force interactive debugging
/swarm-iosm retry T04 --foreground
# Regenerate brief (fresh start)
/swarm-iosm retry T04 --reset-brief
🧪 Testing & Validation
Pre-Execution Validation
# Validate plan structure
python scripts/orchestration_planner.py plan.md --validate
# Generate continuous dispatch plan
python scripts/orchestration_planner.py plan.md --continuous
# Simulate execution (dry-run)
/swarm-iosm simulate <track-id>
Post-Execution Validation
# Summarize reports
python scripts/summarize_reports.py swarm/tracks/<id>
# Check IOSM gates
/swarm-iosm integrate <track-id>
# Verify no circular deps
grep -A10 "Gate-M" swarm/tracks/<id>/iosm_report.md
🌐 Integration with IOSM Ecosystem
IOSM Methodology
The theoretical foundation. See IOSM Repository for:
- Complete specification (algorithm, gates, metrics)
- iosm.yaml configuration schema
- CI/CD integration patterns (GitHub Actions, GitLab CI)
- Language-specific checkers (Python, Rust, TypeScript)
Swarm-IOSM (This Repo)
The Claude Code execution engine implementing IOSM for parallel agent orchestration.
FACET Ecosystem
For deterministic AI contracts, see:
- FACET Standard — Contract Layer for AI
- FACET Compiler — Reference Implementation (Rust)
- FACET Agents — Conformance Test Agents
- FACET MCP Server — Protocol Adapter
🗂️ File Structure
.claude/skills/swarm-iosm/
├── SKILL.md # Main skill definition (1330+ lines)
├── README.md # This file
├── QUICKSTART.md # 5-minute tutorial
├── RUNBOOK.md # Manual orchestration operations
├── VALIDATION.md # Installation checklist
├── TROUBLESHOOTING.md # Common issues & solutions
├── LICENSE # MIT License
├── CONTRIBUTING.md # Contribution guidelines
│
├── templates/ # Progressive disclosure templates
│ ├── prd.md # Product Requirements Document
│ ├── plan.md # Implementation plan
│ ├── subagent_brief.md # Task instructions
│ ├── subagent_report.md # Structured output
│ ├── iosm_gates.md # Quality gate criteria
│ ├── iosm_state.md # Live execution state
│ ├── integration_report.md # Merge plan
│ ├── shared_context.md # Inter-agent communication
│ └── intake_questions.md # Requirements gathering
│
├── scripts/ # Automation scripts
│ ├── orchestration_planner.py # Generate dispatch plan
│ ├── validate_plan.py # Plan structure validation
│ ├── summarize_reports.py # Aggregate outputs
│ ├── merge_context.py # Update shared context
│ ├── parse_errors.py # Error diagnosis
│ ├── error_patterns.py # Known error patterns
│ └── errors.py # Error handling utilities
│
└── examples/ # Demo tracks
└── demo-track/ # Example project
├── plan.md
├── continuous_dispatch_plan.md
├── iosm_state.md
└── reports/
swarm/ # Project workflow data (auto-created)
├── context/ # Project metadata
│ ├── product.md # Product overview
│ ├── tech-stack.md # Technology stack
│ └── workflow.md # Development workflow
│
├── tracks/ # Feature tracks
│ └── YYYY-MM-DD-NNN/ # Track directory
│ ├── intake.md # Requirements intake
│ ├── PRD.md # Product requirements
│ ├── spec.md # Technical specification
│ ├── plan.md # Implementation plan
│ ├── metadata.json # Track metadata
│ ├── continuous_dispatch_plan.md # Execution plan
│ ├── iosm_state.md # Live state (auto-updated)
│ ├── shared_context.md # Inter-agent knowledge
│ ├── reports/ # Subagent reports
│ │ ├── T01.md
│ │ ├── T02.md
│ │ └── ...
│ ├── checkpoints/ # Crash recovery
│ │ └── latest.json
│ ├── integration_report.md # Merge plan
│ ├── iosm_report.md # Quality gate results
│ └── rollback_guide.md # Revert instructions
│
└── tracks.md # Track registry
🤝 Contributing
We welcome contributions! Key areas:
High Priority
- Gate Automation Scripts — Measure IOSM criteria automatically
- CI/CD Integration — GitHub Actions, GitLab CI examples
- Language-Specific Checkers — Python, TypeScript, Rust evaluators
Documentation
- More examples in
examples/ - Video tutorials
- Integration guides for popular frameworks
Templates
- Additional subagent role templates
- Domain-specific PRD templates
- Custom
iosm.yamlconfigurations
Integrations
- IDE plugins (VS Code, JetBrains)
- Issue tracker integrations (Jira, Linear)
- Monitoring/observability tools
See CONTRIBUTING.md for guidelines.
📜 Version History
v2.1 (2026-01-19) — Current
- Automated State Management (auto-generated
iosm_state.md) - Status Sync CLI (
--update-task) - Improved Report Conflict Detection
v2.0 (2026-01-18)
- Inter-Agent Communication (
shared_context.md) - Task Dependency Visualization (
--graph) - Anti-Pattern Detection
- Template Customization
v1.3 (2026-01-17)
- Simulation Mode (
/swarm-iosm simulate) with ASCII Timeline - Live Monitoring (
/swarm-iosm watch) - Checkpointing & Resume (
/swarm-iosm resume)
v1.2 (2026-01-16)
- Concurrency Limits (Resource Budgets)
- Cost Tracking & Model Selection (Haiku/Sonnet/Opus)
- Intelligent Error Diagnosis & Retry (
/swarm-iosm retry)
v1.1 (2026-01-15)
- Continuous Dispatch Loop (no wave barriers)
- Gate-Driven Continuation
- Auto-Spawn from SpawnCandidates
- Touches Lock Manager
iosm_state.mdProgress Tracking
v1.0 (2026-01-10)
- Initial release
- PRD generation
- Wave-based orchestration
- IOSM quality gates
👤 Author
Emil Rokossovskiy (@rokoss21)
AI & Platform Engineer | Equilibrium LLC
Creator of:
- IOSM Methodology — Reproducible system improvement
- FACET Ecosystem — Deterministic Contract Layer for AI
- Swarm-IOSM — This project
📧 Email: [email protected]
🌐 Web: rokoss21.tech
📄 License
MIT License — Copyright (c) 2026 Emil Rokossovskiy
🔗 Related Projects
| Project | Description | Status |
|---|---|---|
| IOSM | The methodology Swarm-IOSM implements | Active |
| FACET Standard | Deterministic Contract Layer for AI | Active |
| FACET Compiler | Reference Compiler (Rust) | Active |
| FACET Agents | Conformance Test Agents | Active |
| FACET MCP Server | Protocol Adapter | Active |
🎓 Learn More
Documentation
- IOSM Specification — Methodology deep dive
- Claude Code Skills — Official documentation
- AstroVisor.io Case Study — Production IOSM example
Videos & Tutorials
- Swarm-IOSM Quickstart — Complete example track
- IOSM in Practice — AstroVisor case study
Community
- GitHub Issues — Bug reports & feature requests
- Discussions — Questions & ideas
IOSM: Improve → Optimize → Shrink → Modularize
Orchestrate complexity. Enforce quality. Ship faster.
Made with ⚡ by @rokoss21
# Supported AI Coding Agents
This skill is compatible with the SKILL.md standard and works with all major AI coding agents:
Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.