Use when adding new error messages to React, or seeing "unknown error code" warnings.
npx skills add eddiebe147/claude-settings --skill "infrastructure-documenter"
Install specific skill from multi-skill repository
# Description
Expert guide for documenting infrastructure including architecture diagrams, runbooks, system documentation, and operational procedures. Use when creating technical documentation for systems and deployments.
# SKILL.md
name: infrastructure-documenter
description: Expert guide for documenting infrastructure including architecture diagrams, runbooks, system documentation, and operational procedures. Use when creating technical documentation for systems and deployments.
Infrastructure Documenter Skill
Overview
This skill helps you create clear, maintainable infrastructure documentation. Covers architecture diagrams, runbooks, system documentation, operational procedures, and documentation-as-code practices.
Documentation Philosophy
Principles
- Living documentation: Keep it in sync with reality
- Audience-aware: Different docs for different readers
- Actionable: Every doc should help someone do something
- Version-controlled: Documentation changes tracked with code
Document Types
| Type | Audience | Purpose |
|---|---|---|
| Architecture | Engineers | Understand system design |
| Runbooks | Ops/SRE | Handle incidents |
| API Docs | Developers | Integrate with system |
| Onboarding | New hires | Get up to speed |
| Decision Records | Future you | Understand why |
Architecture Documentation
System Architecture Overview
# System Architecture
## Overview
[Project Name] is a [type] application that [purpose].
## High-Level Architecture
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Users β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Vercel Edge β
β βββββββββββββββββββ βββββββββββββββββββ β
β β Next.js App β β Edge Functions β β
β βββββββββββββββββββ βββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββΌββββββββββββββββ
βΌ βΌ βΌ
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β Supabase β β Redis β β Stripe β
β - PostgreSQL β β - Session β β - Payments β
β - Auth β β - Cache β β - Webhooks β
β - Realtime β β β β β
β - Storage β β β β β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
## Components
### Frontend (Next.js App)
- **Location**: Vercel Edge Network
- **Framework**: Next.js 14 (App Router)
- **Styling**: Tailwind CSS + shadcn/ui
- **State**: Zustand + React Query
### Backend Services
| Service | Provider | Purpose |
|---------|----------|---------|
| Database | Supabase | PostgreSQL with RLS |
| Auth | Supabase Auth | User authentication |
| Storage | Supabase Storage | File uploads |
| Cache | Upstash Redis | Session & API cache |
| Payments | Stripe | Subscriptions |
| Email | Resend | Transactional emails |
### Data Flow
1. User request β Vercel Edge
2. SSR/API Route processes request
3. Database queries via Supabase client
4. Response cached at edge (when applicable)
5. Response returned to user
## Security
### Authentication Flow
1. User signs in via Supabase Auth
2. JWT token issued and stored in cookie
3. Server validates token on each request
4. RLS policies enforce data access
### Data Protection
- All data encrypted at rest (AES-256)
- TLS 1.3 for data in transit
- Secrets stored in Vercel environment
- PII fields encrypted in database
Mermaid Diagrams
## Request Flow
```mermaid
sequenceDiagram
participant U as User
participant V as Vercel
participant N as Next.js
participant S as Supabase
participant R as Redis
U->>V: HTTPS Request
V->>N: Route to App
alt Cached Response
N->>R: Check Cache
R-->>N: Cache Hit
N-->>U: Return Cached
else Cache Miss
N->>S: Query Database
S-->>N: Data
N->>R: Store in Cache
N-->>U: Return Response
end
Database Schema
erDiagram
users ||--o{ projects : owns
users {
uuid id PK
text email
text name
timestamp created_at
}
projects ||--o{ tasks : contains
projects {
uuid id PK
uuid user_id FK
text name
text status
}
tasks {
uuid id PK
uuid project_id FK
text title
boolean completed
}
## Runbooks
### Runbook Template
```markdown
# Runbook: [Service Name] - [Issue Type]
## Overview
Brief description of the issue and when this runbook applies.
## Severity
- **P1 (Critical)**: Complete outage
- **P2 (High)**: Degraded service
- **P3 (Medium)**: Minor impact
- **P4 (Low)**: No user impact
## Detection
How this issue is typically detected:
- [ ] Alert from [monitoring system]
- [ ] User report
- [ ] Automated check failure
## Impact Assessment
- **Users affected**: All / Segment / None
- **Data at risk**: Yes / No
- **Revenue impact**: High / Medium / Low / None
## Prerequisites
- [ ] Access to [system/dashboard]
- [ ] Credentials for [service]
- [ ] Contact info for [team/person]
## Resolution Steps
### Step 1: Verify the Issue
```bash
# Check service status
curl -I https://api.example.com/health
# Check logs
vercel logs --follow
Step 2: Identify Root Cause
Common causes:
- [ ] Database connection pool exhausted
- [ ] Memory limit reached
- [ ] External service down
- [ ] Bad deployment
Step 3: Apply Fix
If Database Issue:
# Check connection count
SELECT count(*) FROM pg_stat_activity;
# Kill idle connections
SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE state = 'idle' AND query_start < now() - interval '1 hour';
If Bad Deployment:
# Rollback to previous deployment
vercel rollback
Step 4: Verify Fix
# Check service health
curl https://api.example.com/health
# Monitor error rates for 15 minutes
Escalation
If unable to resolve within 30 minutes:
1. Page on-call engineer: [contact]
2. Notify stakeholders in #incidents
3. Update status page
Post-Incident
- [ ] Create incident report
- [ ] Schedule post-mortem (P1/P2 only)
- [ ] Update this runbook if needed
Related Links
### Database Runbooks
```markdown
# Runbook: Database Performance Issues
## Symptoms
- Slow API responses (>1s)
- Timeout errors in logs
- High database CPU in dashboard
## Quick Checks
### 1. Check Active Connections
```sql
SELECT
state,
count(*),
max(now() - query_start) as max_duration
FROM pg_stat_activity
GROUP BY state;
2. Find Long-Running Queries
SELECT
pid,
now() - query_start AS duration,
query
FROM pg_stat_activity
WHERE state = 'active'
AND now() - query_start > interval '30 seconds'
ORDER BY duration DESC;
3. Check Table Sizes
SELECT
schemaname,
tablename,
pg_size_pretty(pg_total_relation_size(schemaname || '.' || tablename)) as size
FROM pg_tables
WHERE schemaname = 'public'
ORDER BY pg_total_relation_size(schemaname || '.' || tablename) DESC
LIMIT 10;
4. Check Missing Indexes
SELECT
relname,
seq_scan,
idx_scan,
seq_scan - idx_scan AS difference
FROM pg_stat_user_tables
WHERE seq_scan > idx_scan
ORDER BY difference DESC;
Resolution
Kill Problematic Queries
SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE pid = [PID_FROM_ABOVE];
Add Missing Index
CREATE INDEX CONCURRENTLY idx_table_column
ON table_name (column_name);
## Decision Records (ADRs)
### ADR Template
```markdown
# ADR-001: Choose Supabase for Database
## Status
Accepted
## Context
We need a database solution for [Project Name] that supports:
- PostgreSQL compatibility
- Real-time subscriptions
- Built-in authentication
- Easy local development
- Generous free tier
## Decision
We will use Supabase as our primary database and auth provider.
## Alternatives Considered
### PlanetScale
**Pros:**
- Excellent scaling
- Branching for schema changes
- MySQL compatible
**Cons:**
- No built-in auth
- No real-time subscriptions
- Additional services needed
### Firebase
**Pros:**
- Real-time built-in
- Mature platform
- Good mobile SDKs
**Cons:**
- NoSQL (not ideal for our use case)
- Vendor lock-in concerns
- Complex security rules
## Consequences
### Positive
- Single provider for DB + Auth + Storage
- Great developer experience
- Row Level Security for data protection
- Local development with supabase CLI
### Negative
- PostgreSQL-specific features tie us to provider
- Supabase still maturing (some rough edges)
- Limited to their managed offering
### Risks
- Supabase scaling limitations at high traffic
- Migration cost if we need to move
## References
- [Supabase Documentation](https://supabase.com/docs)
- [Comparison: Supabase vs Firebase](https://...)
API Documentation
Endpoint Documentation
# API Reference
## Base URL
Production: https://api.example.com/v1
Staging: https://staging-api.example.com/v1
## Authentication
All API requests require authentication via Bearer token.
```bash
curl -H "Authorization: Bearer YOUR_TOKEN" \
https://api.example.com/v1/users
Endpoints
Users
Get Current User
GET /users/me
Response:
{
"id": "usr_123",
"email": "[email protected]",
"name": "John Doe",
"created_at": "2024-01-01T00:00:00Z"
}
Update User
PATCH /users/me
Request Body:
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| name | string | No | Display name |
| avatar_url | string | No | Profile image URL |
Example:
curl -X PATCH \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{"name": "Jane Doe"}' \
https://api.example.com/v1/users/me
Error Responses
| Status | Code | Description |
|---|---|---|
| 400 | BAD_REQUEST | Invalid request body |
| 401 | UNAUTHORIZED | Missing or invalid token |
| 403 | FORBIDDEN | Insufficient permissions |
| 404 | NOT_FOUND | Resource not found |
| 429 | RATE_LIMITED | Too many requests |
| 500 | INTERNAL_ERROR | Server error |
Error Response Format:
{
"error": {
"code": "NOT_FOUND",
"message": "User not found"
}
}
## Environment Documentation
### Environment Matrix
```markdown
# Environments
## Overview
| Environment | URL | Purpose | Deploy |
|-------------|-----|---------|--------|
| Production | https://myapp.com | Live users | Manual (main) |
| Staging | https://staging.myapp.com | Pre-release testing | Auto (main) |
| Preview | https://pr-*.vercel.app | PR review | Auto (PR) |
| Development | http://localhost:3000 | Local dev | Manual |
## Configuration
### Production
```env
NODE_ENV=production
DATABASE_URL=[Supabase Production]
NEXT_PUBLIC_APP_URL=https://myapp.com
Staging
NODE_ENV=production
DATABASE_URL=[Supabase Staging Branch]
NEXT_PUBLIC_APP_URL=https://staging.myapp.com
Development
NODE_ENV=development
DATABASE_URL=[Local Supabase]
NEXT_PUBLIC_APP_URL=http://localhost:3000
Access
Production
- Vercel: Admin only
- Database: Read-only for devs, write for admin
- Logs: All engineers
Staging
- Vercel: All engineers
- Database: All engineers
- Logs: All engineers
Secrets Rotation
| Secret | Rotation | Last Rotated |
|---|---|---|
| Database password | 90 days | 2024-01-15 |
| API keys | 90 days | 2024-01-15 |
| JWT secret | Never | Initial setup |
## Documentation-as-Code
### Documentation Structure
docs/
βββ README.md # Documentation index
βββ architecture/
β βββ overview.md # System architecture
β βββ data-flow.md # Data flow diagrams
β βββ decisions/ # ADRs
β βββ 001-database.md
β βββ 002-hosting.md
βββ runbooks/
β βββ README.md # Runbook index
β βββ database.md # Database issues
β βββ deployment.md # Deployment issues
β βββ outage.md # Service outage
βββ api/
β βββ reference.md # API documentation
βββ onboarding/
βββ setup.md # Local setup
βββ contributing.md # How to contribute
### Auto-Generated Documentation
```yaml
# .github/workflows/docs.yml
name: Generate Docs
on:
push:
branches: [main]
paths:
- 'src/**'
- 'docs/**'
jobs:
generate-docs:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Generate API docs from OpenAPI
run: |
npx @redocly/cli build-docs openapi.yaml \
--output docs/api/index.html
- name: Generate TypeDoc
run: npx typedoc --out docs/api/typescript
- name: Deploy to GitHub Pages
uses: peaceiris/actions-gh-pages@v3
with:
github_token: ${{ secrets.GITHUB_TOKEN }}
publish_dir: ./docs
Documentation Checklist
Architecture Docs
- [ ] System overview diagram
- [ ] Component descriptions
- [ ] Data flow documentation
- [ ] Security architecture
- [ ] Technology decisions (ADRs)
Operational Docs
- [ ] Runbooks for common issues
- [ ] Deployment procedures
- [ ] Monitoring and alerting
- [ ] Incident response plan
- [ ] On-call procedures
Developer Docs
- [ ] Local setup guide
- [ ] API reference
- [ ] Contributing guidelines
- [ ] Code conventions
- [ ] Testing guide
Maintenance
- [ ] Documentation review schedule
- [ ] Ownership assigned
- [ ] Change process defined
- [ ] Versioning strategy
When to Use This Skill
Invoke this skill when:
- Creating architecture documentation
- Writing runbooks for operations
- Documenting decision rationale (ADRs)
- Setting up documentation structure
- Creating onboarding materials
- Building automated documentation
- Planning incident response procedures
# Supported AI Coding Agents
This skill is compatible with the SKILL.md standard and works with all major AI coding agents:
Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.