infrastructure-skill-builder

by @jackspace in DevOps & Cloud

# Install this skill:

npx skills add jackspace/ClaudeSkillz --skill "infrastructure-skill-builder"

Install specific skill from multi-skill repository

# Description

Transform infrastructure documentation, runbooks, and operational knowledge into reusable Claude Code skills. Convert Proxmox configs, Docker setups, Kubernetes deployments, and cloud infrastructure patterns into structured, actionable skills.

# SKILL.md

name: infrastructure-skill-builder
description: Transform infrastructure documentation, runbooks, and operational knowledge into reusable Claude Code skills. Convert Proxmox configs, Docker setups, Kubernetes deployments, and cloud infrastructure patterns into structured, actionable skills.
license: MIT
tags: [infrastructure, documentation, skill-creation, knowledge-capture, iac]

Infrastructure Skill Builder

Convert your infrastructure documentation into powerful, reusable Claude Code skills.

Overview

Infrastructure knowledge is often scattered across:
- README files
- Runbooks and wiki pages
- Configuration files
- Troubleshooting guides
- Team Slack/Discord history
- Mental models of senior engineers

This skill helps you systematically capture that knowledge as Claude Code skills for:
- Faster onboarding
- Consistent operations
- Disaster recovery
- Knowledge preservation
- Team scaling

When to Use

Use this skill when:
- Documenting complex infrastructure setups
- Creating runbooks for operations teams
- Onboarding new team members to infrastructure
- Preserving expert knowledge before team changes
- Standardizing infrastructure operations
- Building organizational infrastructure library
- Migrating from manual to automated operations

Skill Extraction Process

Step 1: Identify Infrastructure Domains

Common domains:
- Container Orchestration: Docker, Kubernetes, Proxmox LXC
- Cloud Platforms: AWS, GCP, Azure, DigitalOcean
- Databases: PostgreSQL, MongoDB, Redis, MySQL
- Web Servers: Nginx, Apache, Caddy, Traefik
- Monitoring: Prometheus, Grafana, ELK Stack
- CI/CD: Jenkins, GitLab CI, GitHub Actions
- Networking: VPNs, Load Balancers, DNS, Firewalls
- Storage: S3, MinIO, NFS, Ceph
- Security: Authentication, SSL/TLS, Firewalls

Step 2: Extract Core Operations

For each domain, document:

Setup/Provisioning: How to create new instances
Configuration: How to configure for different use cases
Operations: Day-to-day management tasks
Troubleshooting: Common issues and resolutions
Scaling: How to scale up/down
Backup/Recovery: Disaster recovery procedures
Monitoring: Health checks and alerts
Security: Security best practices

Infrastructure Skill Template

---
name: [infrastructure-component]-manager
description: Expert guidance for [component] management, provisioning, troubleshooting, and operations
license: MIT
tags: [infrastructure, [component], operations, troubleshooting]
---

# [Component] Manager

Expert knowledge for managing [component] infrastructure.

## Authentication & Access

### Access Methods
```bash
# How to access the infrastructure component
ssh user@host
# or
kubectl config use-context cluster-name

Credentials & Configuration

Where credentials are stored
How to configure access
Common authentication issues

Architecture Overview

Component Topology

How components are organized
Network topology
Resource allocation
Redundancy setup

Key Resources

Resource 1: Purpose and specs
Resource 2: Purpose and specs
Resource 3: Purpose and specs

Common Operations

Operation 1: [e.g., Create New Instance]

# Step-by-step commands
command1 --flags
command2 --flags

# Verification
verify-command

Operation 2: [e.g., Update Configuration]

# Commands and explanations

Operation 3: [e.g., Scale Resources]

# Commands and explanations

Monitoring & Health Checks

Check System Status

# Health check commands
status-command

# Expected output
# What healthy output looks like

Common Metrics

Metric 1: What it means, normal range
Metric 2: What it means, normal range
Metric 3: What it means, normal range

Troubleshooting

Issue 1: [Common Problem]

Symptoms: What you observe
Cause: Why it happens
Fix: Step-by-step resolution

# Fix commands

Issue 2: [Another Problem]

Symptoms:
Cause:
Fix:

# Fix commands

Backup & Recovery

Backup Procedures

# How to backup
backup-command

# Verification
verify-backup

Recovery Procedures

# How to restore
restore-command

# Verification
verify-restore

Security Best Practices

Security practice 1
Security practice 2
Security practice 3

Quick Reference

Task	Command
Task 1	`command1`
Task 2	`command2`
Task 3	`command3`

Additional Resources

Official documentation links
Related skills
External references

## Real-World Example: Proxmox Skill

Based on the proxmox-auth skill in this repository:

### Extracted Knowledge

**From**: Proxmox VE cluster documentation + operational experience

**Structured as**:
1. **Authentication**: SSH access patterns, node IPs
2. **Architecture**: Cluster topology (2 nodes, resources)
3. **Operations**: Container/VM management commands
4. **Troubleshooting**: Common errors and fixes
5. **Networking**: Bridge configuration, IP management
6. **GPU Passthrough**: Special container configurations

**Result**: Comprehensive skill covering:
- Quick access to any node
- Container lifecycle management
- GPU-accelerated containers
- Network troubleshooting
- Backup procedures
- Common gotchas and solutions

## Extraction Scripts

### Extract from Runbooks

```bash
#!/bin/bash
# extract-from-runbook.sh - Convert runbook to skill

RUNBOOK_FILE="$1"
SKILL_NAME="$2"

if [ -z "$RUNBOOK_FILE" ] || [ -z "$SKILL_NAME" ]; then
    echo "Usage: $0 <runbook.md> <skill-name>"
    exit 1
fi

SKILL_DIR="skills/$SKILL_NAME"
mkdir -p "$SKILL_DIR"

# Extract sections from runbook
cat > "$SKILL_DIR/SKILL.md" << EOF
---
name: $SKILL_NAME
description: $(head -5 "$RUNBOOK_FILE" | grep -v "^#" | head -1 | xargs)
license: MIT
extracted-from: $RUNBOOK_FILE
---

# ${SKILL_NAME^}

$(cat "$RUNBOOK_FILE")

---

**Note**: This skill was auto-extracted from runbook documentation.
Review and refine before use.
EOF

echo "✓ Created skill: $SKILL_DIR/SKILL.md"
echo "Review and edit to add:"
echo "  - Metadata and tags"
echo "  - Troubleshooting section"
echo "  - Quick reference"
echo "  - Examples"

Extract from Docker Compose

#!/bin/bash
# docker-compose-to-skill.sh - Extract skill from docker-compose.yaml

COMPOSE_FILE="${1:-docker-compose.yaml}"
PROJECT_NAME=$(basename $(pwd))

SKILL_DIR="skills/docker-$PROJECT_NAME"
mkdir -p "$SKILL_DIR"

# Extract services
SERVICES=$(yq eval '.services | keys | .[]' "$COMPOSE_FILE")

cat > "$SKILL_DIR/SKILL.md" << EOF
---
name: docker-$PROJECT_NAME
description: Docker Compose configuration and management for $PROJECT_NAME
license: MIT
---

# Docker $PROJECT_NAME

Manage Docker Compose stack for $PROJECT_NAME.

## Services

$(yq eval '.services | to_entries | .[] | "### " + .key + "\n" + (.value.image // "custom") + "\n"' "$COMPOSE_FILE")

## Quick Start

\`\`\`bash
# Start all services
docker-compose up -d

# Check status
docker-compose ps

# View logs
docker-compose logs -f

# Stop all services
docker-compose down
\`\`\`

## Service Details

### Ports

$(yq eval '.services | to_entries | .[] | select(.value.ports) | "- **" + .key + "**: " + (.value.ports | join(", "))' "$COMPOSE_FILE")

### Volumes

$(yq eval '.services | to_entries | .[] | select(.value.volumes) | "- **" + .key + "**: " + (.value.volumes | join(", "))' "$COMPOSE_FILE")

## Configuration

See \`$COMPOSE_FILE\` for full configuration.

## Common Operations

### Restart Service

\`\`\`bash
docker-compose restart SERVICE_NAME
\`\`\`

### Update Service

\`\`\`bash
docker-compose pull SERVICE_NAME
docker-compose up -d SERVICE_NAME
\`\`\`

### View Service Logs

\`\`\`bash
docker-compose logs -f SERVICE_NAME
\`\`\`

## Troubleshooting

### Service Won't Start

1. Check logs: \`docker-compose logs SERVICE_NAME\`
2. Verify ports not in use: \`netstat -tulpn | grep PORT\`
3. Check disk space: \`df -h\`

### Network Issues

\`\`\`bash
# Recreate network
docker-compose down
docker network prune
docker-compose up -d
\`\`\`
EOF

echo "✓ Created skill from docker-compose.yaml"

Extract from Kubernetes Manifests

#!/bin/bash
# k8s-to-skill.sh - Extract skill from Kubernetes manifests

K8S_DIR="${1:-.}"
APP_NAME="${2:-$(basename $(pwd))}"

SKILL_DIR="skills/k8s-$APP_NAME"
mkdir -p "$SKILL_DIR"

cat > "$SKILL_DIR/SKILL.md" << EOF
---
name: k8s-$APP_NAME
description: Kubernetes deployment and management for $APP_NAME
license: MIT
---

# Kubernetes $APP_NAME

Manage Kubernetes resources for $APP_NAME.

## Resources

$(find "$K8S_DIR" -name "*.yaml" -o -name "*.yml" | while read file; do
    KIND=$(yq eval '.kind' "$file" 2>/dev/null)
    NAME=$(yq eval '.metadata.name' "$file" 2>/dev/null)
    echo "- **$KIND**: $NAME ($(basename $file))"
done)

## Deployment

### Apply All Resources

\`\`\`bash
kubectl apply -f $K8S_DIR/
\`\`\`

### Check Status

\`\`\`bash
# Pods
kubectl get pods -l app=$APP_NAME

# Services
kubectl get svc -l app=$APP_NAME

# Deployments
kubectl get deploy -l app=$APP_NAME
\`\`\`

## Common Operations

### Scale Deployment

\`\`\`bash
kubectl scale deployment $APP_NAME --replicas=3
\`\`\`

### Update Image

\`\`\`bash
kubectl set image deployment/$APP_NAME container=new-image:tag
\`\`\`

### View Logs

\`\`\`bash
kubectl logs -f deployment/$APP_NAME
\`\`\`

### Port Forward

\`\`\`bash
kubectl port-forward svc/$APP_NAME 8080:80
\`\`\`

## Troubleshooting

### Pod Not Starting

\`\`\`bash
# Check pod events
kubectl describe pod POD_NAME

# Check logs
kubectl logs POD_NAME

# Previous instance logs
kubectl logs POD_NAME --previous
\`\`\`

### Service Not Reachable

\`\`\`bash
# Check endpoints
kubectl get endpoints $APP_NAME

# Check service
kubectl describe svc $APP_NAME

# Test from another pod
kubectl run -it --rm debug --image=busybox --restart=Never -- wget -O- http://$APP_NAME
\`\`\`

## Quick Reference

| Task | Command |
|------|---------|
| Apply | \`kubectl apply -f $K8S_DIR/\` |
| Status | \`kubectl get all -l app=$APP_NAME\` |
| Logs | \`kubectl logs -f deployment/$APP_NAME\` |
| Scale | \`kubectl scale deployment $APP_NAME --replicas=N\` |
| Delete | \`kubectl delete -f $K8S_DIR/\` |
EOF

echo "✓ Created Kubernetes skill"

Infrastructure Patterns to Capture

Pattern 1: SSH Access Matrix

## SSH Access

| Host | IP | Purpose | Access |
|------|--------|---------|--------|
| node1 | 192.168.1.10 | Primary | `ssh node1` |
| node2 | 192.168.1.11 | Secondary | `ssh node2` |
| bastion | 203.0.113.5 | Jump host | `ssh -J bastion node1` |

Pattern 2: Service Port Mapping

## Service Ports

| Service | Internal | External | Protocol |
|---------|----------|----------|----------|
| Web | 8080 | 80 | HTTP |
| API | 3000 | 443 | HTTPS |
| DB | 5432 | - | TCP |

Pattern 3: Configuration Files

## Configuration Locations

### Application Config
- Path: `/etc/app/config.yaml`
- Format: YAML
- Requires restart: Yes

### Database Config
- Path: `/var/lib/postgres/postgresql.conf`
- Format: INI
- Requires restart: Yes

Pattern 4: Command Workflows

## Deployment Workflow

1. **Backup current state**
   ```bash
   ./backup.sh
   ```

2. **Pull latest code**
   ```bash
   git pull origin main
   ```

3. **Build application**
   ```bash
   docker build -t app:latest .
   ```

4. **Deploy**
   ```bash
   docker-compose up -d
   ```

5. **Verify**
   ```bash
   curl http://localhost/health
   ```

Best Practices

✅ DO

Document assumptions - What's required before operations
Include verification - How to verify each operation succeeded
Add troubleshooting - Common issues and fixes
Show outputs - Expected command outputs
Link resources - Related documentation and skills
Version information - Software versions, configurations
Security notes - Security implications of operations
Update regularly - Keep skills current with infrastructure

❌ DON'T

Don't hardcode secrets - Use placeholders or env vars
Don't skip context - Explain why, not just how
Don't assume knowledge - Explain terminology
Don't omit edge cases - Document special scenarios
Don't forget cleanup - Include teardown procedures
Don't ignore dependencies - Document prerequisites
Don't skip testing - Verify all commands work
Don't leave TODO - Complete all sections

Quality Checklist

[ ] Clear component description
[ ] Authentication/access documented
[ ] Architecture overview provided
[ ] Common operations with examples
[ ] Troubleshooting section complete
[ ] Health checks documented
[ ] Backup/recovery procedures
[ ] Security considerations noted
[ ] Quick reference table
[ ] All commands tested
[ ] No hardcoded secrets
[ ] Links to resources

Quick Start Workflow

# 1. Identify infrastructure component
COMPONENT="nginx-reverse-proxy"

# 2. Gather documentation
# - Collect README files
# - Export wiki pages
# - Capture team knowledge
# - Document current setup

# 3. Create skill structure
mkdir -p skills/$COMPONENT

# 4. Fill in template
# Use the Infrastructure Skill Template above

# 5. Test all commands
# Verify every command in skill works

# 6. Review and refine
# Have team review for completeness

# 7. Commit to repository
git add skills/$COMPONENT
git commit -m "docs: Add $COMPONENT infrastructure skill"

Version: 1.0.0
Author: Harvested from proxmox-auth skill pattern
Last Updated: 2025-11-18
License: MIT
Key Principle: Convert tribal knowledge into structured, searchable, actionable skills.

Examples in This Repository

proxmox-auth: Proxmox VE cluster management
docker-*: Docker-based infrastructure
cloudflare-*: Cloudflare infrastructure services

Transform your infrastructure documentation into skills today! 🏗️

# Supported AI Coding Agents

This skill is compatible with the SKILL.md standard and works with all major AI coding agents:

⚡ Amp 🚀 Antigravity 🤖 Claude Code 🦀 Clawdbot 📝 Codex ▶️ Cursor 🤖 Droid 💎 Gemini CLI 🐙 GitHub Copilot 🪿 Goose 📊 Kilo Code 🔧 Kiro CLI 💻 OpenCode 🦘 Roo Code 🌲 Trae 🏄 Windsurf

Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.