Implement GitOps workflows with ArgoCD and Flux for automated, declarative Kubernetes...
npx skills add jgarrison929/openclaw-skills --skill "ci-cd-specialist"
Install specific skill from multi-skill repository
# Description
Use when setting up CI/CD pipelines, GitHub Actions workflows, release automation, deployment strategies (blue-green, canary, rolling), hotfix processes, semantic versioning, containerization, or any build/deploy automation task.
# SKILL.md
name: ci-cd-specialist
version: 1.0.0
description: Use when setting up CI/CD pipelines, GitHub Actions workflows, release automation, deployment strategies (blue-green, canary, rolling), hotfix processes, semantic versioning, containerization, or any build/deploy automation task.
triggers:
- CI/CD
- continuous integration
- continuous deployment
- GitHub Actions
- GitLab CI
- Jenkins
- release
- deployment
- blue-green
- canary
- rolling update
- hotfix
- semantic versioning
- semver
- Docker build
- container
- pipeline
- workflow
- rollback
role: specialist
scope: implementation
output-format: code
CI/CD Specialist
Senior CI/CD engineer specializing in GitHub Actions, release workflows, deployment strategies, hotfix processes, containerization, and build pipeline automation.
Role Definition
You are a senior CI/CD engineer who builds reliable, fast, and secure build/deploy pipelines. You automate everything from linting to production deployment. You design for fail-fast feedback, caching, security scanning, and safe rollback.
Core Principles
- Fail fast — catch errors in the cheapest stage (lint → test → build → deploy)
- Reproducible builds — same commit always produces the same artifact
- Automate everything — manual steps are failure points and bottlenecks
- Never expose secrets — in logs, artifacts, or error messages
- Always have a rollback — every deployment must be reversible
- Version everything — code, config, infrastructure, dependencies
GitHub Actions: Complete CI Pipeline
# .github/workflows/ci.yml
name: CI
on:
push:
branches: [main, develop]
pull_request:
branches: [main]
concurrency:
group: ci-${{ github.ref }}
cancel-in-progress: true # Cancel stale runs on same branch
env:
NODE_VERSION: '20'
jobs:
# ── Lint & Type Check (fastest feedback) ──────────────────
lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: ${{ env.NODE_VERSION }}
cache: 'npm'
- run: npm ci
- run: npm run lint
- run: npm run typecheck
# ── Unit Tests (parallel with lint) ───────────────────────
test:
runs-on: ubuntu-latest
strategy:
matrix:
shard: [1, 2, 3] # Parallel test shards
services:
postgres:
image: postgres:16-alpine
env:
POSTGRES_PASSWORD: test
POSTGRES_DB: test_db
ports: ['5432:5432']
options: >-
--health-cmd pg_isready
--health-interval 10s
--health-timeout 5s
--health-retries 5
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: ${{ env.NODE_VERSION }}
cache: 'npm'
- run: npm ci
- run: npm run db:migrate
env:
DATABASE_URL: postgres://postgres:test@localhost:5432/test_db
- run: npm test -- --shard=${{ matrix.shard }}/3 --coverage
env:
DATABASE_URL: postgres://postgres:test@localhost:5432/test_db
- uses: actions/upload-artifact@v4
with:
name: coverage-${{ matrix.shard }}
path: coverage/
# ── Security Scan (parallel) ──────────────────────────────
security:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- run: npm audit --audit-level=high
- uses: github/codeql-action/init@v3
with:
languages: javascript
- uses: github/codeql-action/analyze@v3
# ── Build (after lint + test pass) ────────────────────────
build:
needs: [lint, test]
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: ${{ env.NODE_VERSION }}
cache: 'npm'
- run: npm ci
- run: npm run build
- uses: actions/upload-artifact@v4
with:
name: build-output
path: dist/
retention-days: 7
# ── Docker Image (only on main) ───────────────────────────
docker:
needs: [build, security]
if: github.ref == 'refs/heads/main'
runs-on: ubuntu-latest
permissions:
contents: read
packages: write
steps:
- uses: actions/checkout@v4
- uses: docker/setup-buildx-action@v3
- uses: docker/login-action@v3
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- uses: docker/build-push-action@v5
with:
context: .
push: true
tags: |
ghcr.io/${{ github.repository }}:${{ github.sha }}
ghcr.io/${{ github.repository }}:latest
cache-from: type=gha
cache-to: type=gha,mode=max
Production Dockerfile (Multi-Stage)
# ── Stage 1: Dependencies ──────────────────────────────────
FROM node:20-alpine AS deps
WORKDIR /app
COPY package.json package-lock.json ./
RUN npm ci --only=production && \
cp -R node_modules /prod_modules && \
npm ci # Full install for build stage
# ── Stage 2: Build ─────────────────────────────────────────
FROM node:20-alpine AS build
WORKDIR /app
COPY --from=deps /app/node_modules ./node_modules
COPY . .
RUN npm run build
# ── Stage 3: Production Runtime ────────────────────────────
FROM node:20-alpine AS runtime
WORKDIR /app
# Security: non-root user
RUN addgroup -g 1001 appgroup && \
adduser -u 1001 -G appgroup -s /bin/sh -D appuser
# Copy only production dependencies + built output
COPY --from=deps /prod_modules ./node_modules
COPY --from=build /app/dist ./dist
COPY package.json ./
# Health check
HEALTHCHECK --interval=30s --timeout=3s --retries=3 \
CMD wget -qO- http://localhost:3000/health || exit 1
USER appuser
EXPOSE 3000
CMD ["node", "dist/server.js"]
Docker Compose for Local Development
# docker-compose.yml
services:
app:
build:
context: .
target: build # Use build stage for dev (includes devDeps)
ports:
- "3000:3000"
volumes:
- ./src:/app/src # Hot reload
environment:
- NODE_ENV=development
- DATABASE_URL=postgres://postgres:postgres@db:5432/app_dev
- REDIS_URL=redis://redis:6379
depends_on:
db:
condition: service_healthy
redis:
condition: service_started
db:
image: postgres:16-alpine
environment:
POSTGRES_PASSWORD: postgres
POSTGRES_DB: app_dev
ports:
- "5432:5432"
volumes:
- pgdata:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres"]
interval: 5s
timeout: 3s
retries: 5
redis:
image: redis:7-alpine
ports:
- "6379:6379"
volumes:
pgdata:
Release Automation
Semantic Release Workflow
# .github/workflows/release.yml
name: Release
on:
push:
branches: [main]
permissions:
contents: write
packages: write
jobs:
release:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0 # Full history for changelog generation
- uses: actions/setup-node@v4
with:
node-version: 20
cache: 'npm'
- run: npm ci
- run: npm test
- run: npm run build
# Determine version bump from conventional commits
- name: Determine version
id: version
run: |
# Analyze commits since last tag
LAST_TAG=$(git describe --tags --abbrev=0 2>/dev/null || echo "v0.0.0")
COMMITS=$(git log ${LAST_TAG}..HEAD --pretty=format:"%s")
if echo "$COMMITS" | grep -qE "^feat!:|^BREAKING CHANGE:"; then
echo "bump=major" >> $GITHUB_OUTPUT
elif echo "$COMMITS" | grep -qE "^feat(\(.+\))?:"; then
echo "bump=minor" >> $GITHUB_OUTPUT
elif echo "$COMMITS" | grep -qE "^fix(\(.+\))?:"; then
echo "bump=patch" >> $GITHUB_OUTPUT
else
echo "bump=none" >> $GITHUB_OUTPUT
fi
- name: Bump version
if: steps.version.outputs.bump != 'none'
run: |
npm version ${{ steps.version.outputs.bump }} --no-git-tag-version
VERSION=$(node -p "require('./package.json').version")
echo "VERSION=$VERSION" >> $GITHUB_ENV
- name: Generate changelog
if: steps.version.outputs.bump != 'none'
run: |
LAST_TAG=$(git describe --tags --abbrev=0 2>/dev/null || echo "v0.0.0")
{
echo "## What's Changed"
echo ""
git log ${LAST_TAG}..HEAD --pretty=format:"- %s (%h)" | \
sed 's/^- feat/- ✨ feat/; s/^- fix/- 🐛 fix/; s/^- docs/- 📚 docs/; s/^- perf/- ⚡ perf/'
} > RELEASE_NOTES.md
- name: Commit, tag, and push
if: steps.version.outputs.bump != 'none'
run: |
git config user.name "github-actions[bot]"
git config user.email "github-actions[bot]@users.noreply.github.com"
git add package.json package-lock.json
git commit -m "chore: release v${VERSION}"
git tag "v${VERSION}"
git push origin main --tags
- name: Create GitHub Release
if: steps.version.outputs.bump != 'none'
uses: softprops/action-gh-release@v1
with:
tag_name: v${{ env.VERSION }}
body_path: RELEASE_NOTES.md
generate_release_notes: true
Conventional Commits Reference
feat: New feature → MINOR bump (1.0.0 → 1.1.0)
fix: Bug fix → PATCH bump (1.0.0 → 1.0.1)
feat!: Breaking feature → MAJOR bump (1.0.0 → 2.0.0)
docs: Documentation only → no release
style: Formatting, whitespace → no release
refactor: Code restructuring → no release
perf: Performance improvement → PATCH bump
test: Adding/fixing tests → no release
chore: Build, tooling, deps → no release
# Examples
feat(auth): add OAuth2 login support
fix(api): handle null response from payment gateway
feat!: rename User.fullName to User.displayName
BREAKING CHANGE: User.fullName has been renamed to User.displayName.
Update all references in your code.
Deployment Strategies
Blue-Green Deployment
# .github/workflows/deploy-blue-green.yml
name: Deploy (Blue-Green)
on:
workflow_dispatch:
inputs:
version:
description: 'Version to deploy'
required: true
jobs:
deploy:
runs-on: ubuntu-latest
environment: production
steps:
- name: Deploy to inactive environment
run: |
# Determine which environment is inactive
ACTIVE=$(aws elbv2 describe-target-groups \
--names "app-blue" "app-green" \
--query "TargetGroups[?length(LoadBalancerArns)>`0`].TargetGroupName" \
--output text)
if [ "$ACTIVE" = "app-blue" ]; then
DEPLOY_TO="green"
else
DEPLOY_TO="blue"
fi
echo "Deploying v${{ inputs.version }} to $DEPLOY_TO"
# Deploy new version to inactive environment
aws ecs update-service \
--cluster production \
--service "app-${DEPLOY_TO}" \
--task-definition "app:${{ inputs.version }}" \
--desired-count 3
- name: Health check inactive environment
run: |
for i in {1..30}; do
if curl -sf "https://${DEPLOY_TO}.internal.example.com/health"; then
echo "Health check passed"
exit 0
fi
sleep 10
done
echo "Health check failed"
exit 1
- name: Switch traffic
run: |
# Update ALB listener to point to new target group
aws elbv2 modify-listener \
--listener-arn $LISTENER_ARN \
--default-actions Type=forward,TargetGroupArn=$NEW_TG_ARN
- name: Verify production
run: |
sleep 30
curl -sf https://api.example.com/health
# Run smoke tests
npm run test:smoke:production
- name: Scale down old environment
run: |
aws ecs update-service \
--cluster production \
--service "app-${OLD_ENV}" \
--desired-count 0
Canary Deployment
# Kubernetes canary with progressive traffic shifting
# canary-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: app-canary
labels:
app: myapp
track: canary
spec:
replicas: 1 # Start with 1 canary pod
selector:
matchLabels:
app: myapp
track: canary
template:
metadata:
labels:
app: myapp
track: canary
spec:
containers:
- name: app
image: ghcr.io/org/app:v1.3.0 # New version
ports:
- containerPort: 3000
readinessProbe:
httpGet:
path: /health
port: 3000
initialDelaySeconds: 5
periodSeconds: 10
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"
---
# Istio VirtualService for traffic splitting
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: app-routing
spec:
hosts: [api.example.com]
http:
- route:
- destination:
host: app-stable
port:
number: 3000
weight: 90 # 90% to stable
- destination:
host: app-canary
port:
number: 3000
weight: 10 # 10% to canary
Hotfix Process
#!/bin/bash
# scripts/hotfix.sh — Guided hotfix workflow
set -e
echo "🚨 HOTFIX WORKFLOW"
echo "=================="
# 1. Branch from latest production tag
LATEST_TAG=$(git describe --tags --abbrev=0)
echo "Latest production tag: $LATEST_TAG"
read -p "Hotfix branch name (e.g., fix-auth-bypass): " BRANCH_NAME
git checkout "$LATEST_TAG"
git checkout -b "hotfix/$BRANCH_NAME"
echo ""
echo "📝 Make your fix now, then run this script again with --continue"
echo " Remember: MINIMAL changes only. No refactoring."
echo ""
if [ "$1" = "--continue" ]; then
# 2. Run targeted tests
echo "Running tests..."
npm test
# 3. Version bump (patch)
npm version patch --no-git-tag-version
VERSION=$(node -p "require('./package.json').version")
# 4. Commit, tag, push
git add -A
git commit -m "hotfix: $BRANCH_NAME
Fixes critical issue in production.
Version: $VERSION"
git tag "v${VERSION}"
git push origin "hotfix/$BRANCH_NAME" "v${VERSION}"
# 5. Merge back to main
echo "Creating PR to merge hotfix back to main..."
gh pr create \
--base main \
--title "hotfix: merge $BRANCH_NAME back to main" \
--body "Automated PR to merge hotfix v${VERSION} back to main branch."
echo ""
echo "✅ Hotfix v${VERSION} tagged and pushed."
echo " Deploy will trigger automatically from the tag."
echo " Don't forget to merge the PR back to main!"
fi
Rollback Script
#!/bin/bash
# scripts/rollback.sh — Emergency rollback
set -e
echo "🔄 ROLLBACK PROCEDURE"
echo "====================="
# Show recent versions
echo "Recent versions:"
git tag --sort=-version:refname | head -5
echo ""
CURRENT=$(curl -sf https://api.example.com/version 2>/dev/null || echo "unknown")
echo "Current production version: $CURRENT"
echo ""
read -p "Rollback to version (e.g., v1.2.3): " TARGET_VERSION
# Validate version exists
if ! git rev-parse "$TARGET_VERSION" >/dev/null 2>&1; then
echo "❌ Version $TARGET_VERSION not found"
exit 1
fi
echo ""
echo "⚠️ Rolling back from $CURRENT to $TARGET_VERSION"
read -p "Are you sure? (type 'rollback' to confirm): " CONFIRM
if [ "$CONFIRM" != "rollback" ]; then
echo "Cancelled."
exit 0
fi
# Check for database migrations between versions
echo "Checking for database migrations..."
MIGRATION_COUNT=$(git diff --name-only "$TARGET_VERSION"..HEAD -- 'migrations/' | wc -l)
if [ "$MIGRATION_COUNT" -gt 0 ]; then
echo "⚠️ WARNING: $MIGRATION_COUNT migration(s) exist between versions."
echo " Database rollback may cause data loss."
read -p "Continue anyway? (yes/no): " DB_CONFIRM
if [ "$DB_CONFIRM" != "yes" ]; then
echo "Cancelled. Consider a forward fix instead."
exit 0
fi
fi
# Execute rollback
echo "Deploying $TARGET_VERSION..."
kubectl set image deployment/app app="ghcr.io/org/app:${TARGET_VERSION}" \
--record
kubectl rollout status deployment/app --timeout=300s
# Verify
echo "Verifying..."
sleep 15
NEW_VERSION=$(curl -sf https://api.example.com/version)
if [ "$NEW_VERSION" = "$TARGET_VERSION" ]; then
echo "✅ Rollback to $TARGET_VERSION successful"
else
echo "❌ Version mismatch: expected $TARGET_VERSION, got $NEW_VERSION"
echo " Manual intervention may be required."
exit 1
fi
Pipeline Optimization Tips
# Speed up CI with these patterns:
# 1. Cancel stale runs
concurrency:
group: ci-${{ github.ref }}
cancel-in-progress: true
# 2. Cache aggressively
- uses: actions/cache@v4
with:
path: |
~/.npm
node_modules
key: deps-${{ hashFiles('package-lock.json') }}
# 3. Run independent jobs in parallel (not sequential)
jobs:
lint: ...
test: ...
security: ...
build:
needs: [lint, test] # Only build depends on lint+test
# 4. Use matrix for parallel test shards
test:
strategy:
matrix:
shard: [1, 2, 3, 4]
steps:
- run: npm test -- --shard=${{ matrix.shard }}/4
# 5. Skip jobs when only docs changed
test:
if: |
!contains(github.event.head_commit.message, '[skip ci]') &&
!startsWith(github.event.head_commit.message, 'docs:')
# 6. Docker layer caching
- uses: docker/build-push-action@v5
with:
cache-from: type=gha
cache-to: type=gha,mode=max
Anti-Patterns to Avoid
- ❌ No concurrency control — stale CI runs waste resources and confuse results
- ❌ Sequential jobs that could run parallel — lint/test/security are independent
- ❌ Building Docker images without cache — multi-stage + layer caching saves minutes
- ❌ Secrets in code or logs — use GitHub secrets + mask in output
- ❌ No rollback plan — if you can't undo it, don't deploy it
- ❌ Manual version bumping — use conventional commits + automated semver
- ❌ Deploying without health checks — verify the new version actually works
- ❌ Skipping staging — "it works in CI" is not enough
- ❌ Massive infrequent releases — small frequent releases are safer and easier to debug
- ❌ No deployment notifications — the team should know when and what was deployed
Adapted from buildwithclaude by Dave Poon (MIT)
# Supported AI Coding Agents
This skill is compatible with the SKILL.md standard and works with all major AI coding agents:
Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.