pipeline-discovery

by @miles-knowbl in Development

# Install this skill:

npx skills add miles-knowbl/orchestrator --skill "pipeline-discovery"

Install specific skill from multi-skill repository

# Description

Identify backend data pipelines (P-series) in the codebase. Discovers server-side data flows triggered by user actions or system events, documenting triggers, steps, and outcomes. Foundation for MECE failure mode analysis.

# SKILL.md

name: pipeline-discovery
description: "Identify backend data pipelines (P-series) in the codebase. Discovers server-side data flows triggered by user actions or system events, documenting triggers, steps, and outcomes. Foundation for MECE failure mode analysis."
phase: INIT
category: core
version: "1.0.0"
depends_on: [requirements]
tags: [audit, pipeline, discovery, backend, data-flow]

Pipeline Discovery

Identify backend data pipelines (P-series).

When to Use

Starting an audit — Runs in INIT phase to map backend flows
Understanding data flows — Document how data moves through the system
Preparing for failure mode analysis — Identify what can break
When you say: "find the pipelines", "map the backend", "what data flows exist?"

Reference Requirements

MUST read before applying this skill:

Reference	Why Required
`pipeline-identification.md`	How to find pipelines in code
`pipeline-template.md`	How to document each pipeline

Read if applicable:

Reference	When Needed
`common-patterns.md`	Recognize typical pipeline patterns

Verification: All major backend data flows are documented with triggers and outcomes.

Required Deliverables

Deliverable	Location	Condition
Pipeline inventory	`AUDIT-SCOPE.md`	Always (P-series section)
State update	`audit-state.json`	Always (backend_pipelines array)

Core Concept

Pipeline Discovery answers: "What are the major backend data flows?"

A pipeline is:
- Triggered by user action or system event
- Processes data through multiple steps
- Produces a persistent outcome

Examples:
- P1: Source Ingestion (file upload → parsed schema)
- P2: Content Generation (generate button → artifact created)
- P3: Publishing (publish button → post live on platform)

Pipeline Identification

┌─────────────────────────────────────────────────────────────┐
│               PIPELINE DISCOVERY PROCESS                    │
│                                                             │
│  1. FIND ENTRY POINTS                                       │
│     ├─→ API routes (POST, PUT, DELETE)                     │
│     ├─→ Background job handlers                            │
│     ├─→ Webhook receivers                                  │
│     └─→ Event listeners                                    │
│                                                             │
│  2. TRACE DATA FLOW                                         │
│     ├─→ Input validation/parsing                           │
│     ├─→ Business logic                                     │
│     ├─→ External service calls                             │
│     └─→ Database writes                                    │
│                                                             │
│  3. DOCUMENT EACH PIPELINE                                  │
│     ├─→ Trigger (what starts it)                           │
│     ├─→ Steps (what happens)                               │
│     └─→ Outcome (what it produces)                         │
│                                                             │
│  4. ASSIGN P-SERIES IDS                                     │
│     └─→ P1, P2, P3... in order of discovery                │
└─────────────────────────────────────────────────────────────┘

Where to Look

API Routes

// Next.js API routes
app/api/*/route.ts

// Express routes
routes/*.ts
controllers/*.ts

Background Jobs

// Job processors
jobs/*.ts
workers/*.ts
queues/*.ts

Database Operations

// Database writes indicate pipeline endpoints
supabase.from('table').insert()
prisma.model.create()

External Services

// External API calls often indicate pipelines
openai.chat.completions.create()
twitter.post()

Pipeline Documentation Format

### P1: Source Ingestion

**Trigger:** User uploads file via /sources/upload
**Frequency:** ~50/day

**Steps:**
1. File received at `api/sources/upload/route.ts:23`
2. File type validated (`lib/validators.ts:45`)
3. Content parsed by type (`lib/parsers/index.ts:12`)
4. Schema extracted (`lib/schema-extractor.ts:78`)
5. Source record created (`lib/db/sources.ts:34`)
6. Embedding generated (`lib/embeddings.ts:56`)

**Outcome:**
- `sources` table: new row with metadata
- `source_embeddings` table: vector for search
- `source_schema` JSON: extracted structure

**Key Files:**
- `api/sources/upload/route.ts`
- `lib/parsers/*.ts`
- `lib/schema-extractor.ts`

Output Format

In AUDIT-SCOPE.md

## Backend Pipelines (P-series)

| ID | Name | Trigger | Outcome |
|----|------|---------|---------|
| P1 | Source Ingestion | File upload | source_schema populated |
| P2 | Content Generation | Generate button | Artifact created |
| P3 | Publishing | Publish button | Post live on platform |

### P1: Source Ingestion
[detailed documentation]

### P2: Content Generation
[detailed documentation]

In audit-state.json

{
  "backend_pipelines": [
    {
      "id": "P1",
      "name": "Source Ingestion",
      "trigger": "File upload via /sources/upload",
      "outcome": "source_schema populated",
      "key_files": ["api/sources/upload/route.ts", "lib/parsers/index.ts"],
      "step_count": 6
    }
  ]
}

Discovery Checklist

[ ] All POST/PUT/DELETE API routes examined
[ ] Background job handlers identified
[ ] Database write operations traced
[ ] External API calls documented
[ ] Each pipeline has trigger, steps, outcome
[ ] P-series IDs assigned consistently

Common Pipeline Patterns

Pattern	Example	Indicators
CRUD Create	User registration	POST route → validate → insert
File Processing	Document upload	POST multipart → parse → store
Generation	AI content	POST → LLM call → store result
Publishing	Social post	POST → external API → update status
Batch Job	Daily report	Cron → query → aggregate → email

Validation

Before completing, verify:

[ ] All major data flows are documented
[ ] Each pipeline has a unique P-series ID
[ ] Triggers are user-observable actions or system events
[ ] Outcomes are persistent (database writes, external effects)
[ ] Key files are identified for each pipeline
[ ] Step counts are accurate

# Supported AI Coding Agents

This skill is compatible with the SKILL.md standard and works with all major AI coding agents:

⚡ Amp 🚀 Antigravity 🤖 Claude Code 🦀 Clawdbot 📝 Codex ▶️ Cursor 🤖 Droid 💎 Gemini CLI 🐙 GitHub Copilot 🪿 Goose 📊 Kilo Code 🔧 Kiro CLI 💻 OpenCode 🦘 Roo Code 🌲 Trae 🏄 Windsurf

Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.