miles-knowbl

architecture-extractor

1
0
# Install this skill:
npx skills add miles-knowbl/orchestrator --skill "architecture-extractor"

Install specific skill from multi-skill repository

# Description

Analyze a source system — codebase, documentation, diagrams, or verbal description — and extract a clean, structured architecture document. Reverse-engineers components, data flows, interfaces, patterns, and constraints into ARCHITECTURE.md.

# SKILL.md


name: architecture-extractor
description: "Analyze a source system — codebase, documentation, diagrams, or verbal description — and extract a clean, structured architecture document. Reverse-engineers components, data flows, interfaces, patterns, and constraints into ARCHITECTURE.md."
phase: EXTRACT
category: specialized
version: "1.0.0"
depends_on: []
tags: [architecture, analysis, extraction, reverse-engineering, transposition]


Architecture Extractor

Analyze a source system and extract its architecture into a structured document.

When to Use

  • Porting an existing system --- Need to understand its architecture before rebuilding in a new stack
  • Reverse-engineering --- Source system has no architecture docs, need to create them
  • Architecture audit --- Want a clean snapshot of how a system is actually structured
  • Transposition input --- First step of the transpose-loop: extract before mapping to a new stack
  • When you say: "extract the architecture", "what's the architecture of this system?", "document how this works"

Required Deliverables

Deliverable Location Condition
ARCHITECTURE.md Project root Always

Core Concept

Architecture extraction answers: "What is the actual architecture of this system?"

This skill works from whatever source material is available:

Source Approach
Codebase Read entry points, trace data flows, map module boundaries
Documentation Synthesize from READMEs, wikis, API docs
Diagrams Interpret from existing visual representations
Verbal description Structure from user's explanation of how the system works
Running system Infer from API responses, database schema, logs

Extraction is NOT:
- Designing a new architecture (that's architect)
- Evaluating quality (that's audit-loop)
- Planning a migration (that's migration-planner)

The Extraction Process

SOURCE MATERIAL
     │
     ▼
┌─────────────────────────────────────────────────────────┐
│                 EXTRACTION PROCESS                       │
│                                                         │
│  1. SURVEY                                              │
│     └─→ What source material exists? What's missing?    │
│                                                         │
│  2. MAP COMPONENTS                                      │
│     └─→ Services, modules, layers, external systems     │
│                                                         │
│  3. TRACE DATA FLOWS                                    │
│     └─→ Request paths, event flows, data pipelines      │
│                                                         │
│  4. EXTRACT DATA MODEL                                  │
│     └─→ Entities, relationships, storage patterns       │
│                                                         │
│  5. IDENTIFY INTERFACES                                 │
│     └─→ APIs, protocols, contracts between components   │
│                                                         │
│  6. CATALOG CROSS-CUTTING CONCERNS                      │
│     └─→ Auth, logging, errors, caching, config          │
│                                                         │
│  7. DOCUMENT PATTERNS                                   │
│     └─→ Architectural patterns in use, conventions      │
│                                                         │
│  8. CAPTURE CONSTRAINTS                                 │
│     └─→ What the architecture optimizes for and why     │
│                                                         │
└─────────────────────────────────────────────────────────┘
     │
     ▼
ARCHITECTURE.md

Step 1: Survey Source Material

Assess what's available and identify gaps:

### Source Material Inventory

| Source | Available | Quality | Notes |
|--------|-----------|---------|-------|
| Codebase | Yes/No | High/Medium/Low | [location, language, size] |
| README / docs | Yes/No | | [what's covered] |
| API documentation | Yes/No | | [format, completeness] |
| Database schema | Yes/No | | [access method] |
| Architecture diagrams | Yes/No | | [format, age] |
| User description | Yes/No | | [detail level] |
| Running instance | Yes/No | | [access available] |

### Gaps
- [What's missing that would help]

Prioritize codebase and database schema as the most reliable sources. Documentation may be outdated.

Step 2: Map Components

Identify every distinct component:

### Component Inventory

| Component | Type | Responsibility | Technology | Communicates With |
|-----------|------|---------------|------------|-------------------|
| [name] | service/module/layer/external | [what it does] | [language/framework] | [other components] |

Component types:

Type Description How to Identify
Service Independent deployable unit Separate process, own entry point
Module Logical grouping within a service Directory/package boundary
Layer Horizontal slice (presentation, business, data) Import direction, naming conventions
External Third-party system API calls, SDK usage
Infrastructure Platform component Database, queue, cache, CDN

Step 3: Trace Data Flows

For each major user action or system event, trace the full path:

### Data Flow: [Action Name]

1. [Entry point] receives [input]
2. [Component A] validates and transforms
3. [Component B] applies business logic
4. [Storage] persists result
5. [Component C] sends notification
6. [Entry point] returns [output]

Identify:
- Request paths — user action to response
- Event flows — triggers, handlers, side effects
- Data pipelines — batch processing, ETL, sync jobs
- Background processes — cron jobs, workers, schedulers

Step 4: Extract Data Model

### Data Model

| Entity | Storage | Key Fields | Relationships |
|--------|---------|------------|---------------|
| [name] | [postgres/mongo/redis/etc] | [important fields] | [belongs_to/has_many/references] |

### Storage Patterns

| Pattern | Where Used | Purpose |
|---------|-----------|---------|
| [e.g., soft delete] | [entities] | [why] |
| [e.g., JSONB columns] | [entities] | [why] |
| [e.g., event sourcing] | [entities] | [why] |

Step 5: Identify Interfaces

### API Surfaces

| Interface | Protocol | Format | Auth | Consumers |
|-----------|----------|--------|------|-----------|
| [name] | REST/GraphQL/gRPC/WebSocket | JSON/protobuf | [method] | [who calls it] |

### Internal Interfaces

| From | To | Mechanism | Contract |
|------|-----|-----------|----------|
| [component] | [component] | function call/event/queue/HTTP | [shape of data exchanged] |

Step 6: Catalog Cross-Cutting Concerns

Concern Implementation Notes
Authentication [method: JWT, session, OAuth, API key] [provider, flow]
Authorization [method: RBAC, ABAC, RLS] [granularity]
Error handling [strategy: exceptions, result types, error codes] [propagation pattern]
Logging [library, format, destination] [structured? levels?]
Configuration [method: env vars, config files, feature flags] [per-environment?]
Caching [what, where, TTL strategy] [invalidation method]
Monitoring [metrics, health checks, alerting] [tools used]

Step 7: Document Patterns

Identify architectural patterns in use:

### Architectural Patterns

| Pattern | Where Applied | Confidence |
|---------|--------------|------------|
| [e.g., Layered architecture] | [whole system / specific service] | High/Medium/Low |
| [e.g., Event-driven] | [specific subsystem] | |
| [e.g., Repository pattern] | [data access layer] | |
| [e.g., CQRS] | [read vs write paths] | |

### Conventions

| Convention | Example | Consistency |
|-----------|---------|-------------|
| [naming] | [example from code] | [how consistently followed] |
| [file structure] | [example layout] | |
| [error format] | [example error] | |

Step 8: Capture Constraints and Quality Attributes

### Quality Attributes (What the Architecture Optimizes For)

| Attribute | Evidence | Priority |
|-----------|----------|----------|
| [e.g., Developer velocity] | [simple stack, few abstractions] | Inferred: High |
| [e.g., Scalability] | [stateless services, queue-based workers] | Inferred: Medium |

### Constraints

| Constraint | Source | Impact |
|-----------|--------|--------|
| [e.g., Must run on single server] | [infrastructure setup] | [limits horizontal scaling] |
| [e.g., Python-only team] | [all code is Python] | [technology choices] |

ARCHITECTURE.md Template

# [System Name] Architecture

## Overview
[One paragraph: what the system does and how it's structured]

## Components
[Component inventory table from Step 2]

### Component Diagram
[ASCII diagram showing components and connections]

## Data Model
[Entity table and storage patterns from Step 4]

## Data Flows
[Major data flow traces from Step 3]

## Interfaces
[API surfaces and internal interfaces from Step 5]

## Cross-Cutting Concerns
[Table from Step 6]

## Architectural Patterns
[Patterns and conventions from Step 7]

## Quality Attributes and Constraints
[From Step 8]

## Source Material
[What was analyzed, confidence level per section]

Confidence Levels

Not all extracted architecture has equal certainty. Mark confidence:

Level Meaning When to Apply
High Directly observed in code/schema Read it in source
Medium Inferred from patterns Multiple clues point to this
Low Best guess from limited info Single clue or user description only

Relationship to Other Skills

Skill Relationship
architect Architect designs new architecture; extractor documents existing architecture
stack-analyzer Extractor output feeds stack-analyzer for transposition mapping
spec Extracted architecture informs spec compilation
code-verification Extraction may reveal architectural issues

Key Principles

Extract what IS, not what SHOULD BE. Document the actual architecture, not an idealized version. Note gaps and issues but don't fix them during extraction.

Trace code over documentation. Code is truth. Documentation may be outdated. When they conflict, trust the code.

Confidence matters. Mark uncertain sections. Downstream skills (stack-analyzer, spec) need to know what's solid and what's inferred.

Components over code. Focus on structural elements and their relationships, not implementation details. The goal is architectural understanding, not a code walkthrough.

One pass is not enough. Initial survey reveals structure; data flow tracing reveals hidden connections. Plan for at least two passes through the source material.

# Supported AI Coding Agents

This skill is compatible with the SKILL.md standard and works with all major AI coding agents:

Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.