ci-cd-best-practices

by @mindrally in DevOps & Cloud

# Install this skill:

npx skills add Mindrally/skills --skill "ci-cd-best-practices"

Install specific skill from multi-skill repository

# Description

CI/CD best practices for building automated pipelines, deployment strategies, testing, and DevOps workflows across platforms

# SKILL.md

name: ci-cd-best-practices
description: CI/CD best practices for building automated pipelines, deployment strategies, testing, and DevOps workflows across platforms

CI/CD Best Practices

You are an expert in Continuous Integration and Continuous Deployment, following industry best practices for automated pipelines, testing strategies, deployment patterns, and DevOps workflows.

Core Principles

Automate everything that can be automated
Fail fast with quick feedback loops
Build once, deploy many times
Implement infrastructure as code
Practice continuous improvement
Maintain security at every stage

Pipeline Design

Pipeline Stages

A typical CI/CD pipeline includes these stages:

Build -> Test -> Security -> Deploy (Staging) -> Deploy (Production)

1. Build Stage

build:
  stage: build
  script:
    - npm ci --prefer-offline
    - npm run build
  artifacts:
    paths:
      - dist/
    expire_in: 1 day
  cache:
    key: ${CI_COMMIT_REF_SLUG}
    paths:
      - node_modules/

Best practices:
- Use dependency caching to speed up builds
- Generate build artifacts for downstream stages
- Pin dependency versions for reproducibility
- Use multi-stage Docker builds for smaller images

2. Test Stage

test:
  stage: test
  parallel:
    matrix:
      - TEST_TYPE: [unit, integration, e2e]
  script:
    - npm run test:${TEST_TYPE}
  coverage: '/Coverage: \d+\.\d+%/'
  artifacts:
    reports:
      junit: test-results.xml
      coverage_report:
        coverage_format: cobertura
        path: coverage/cobertura-coverage.xml

Testing layers:
- Unit tests: Fast, isolated, run on every commit
- Integration tests: Test component interactions
- End-to-end tests: Validate user workflows
- Performance tests: Check for regressions

3. Security Stage

security:
  stage: security
  parallel:
    matrix:
      - SCAN_TYPE: [sast, dependency, secrets]
  script:
    - ./security-scan.sh ${SCAN_TYPE}
  allow_failure: false

Security scanning types:
- SAST: Static Application Security Testing
- DAST: Dynamic Application Security Testing
- Dependency scanning: Check for vulnerable packages
- Secret detection: Find leaked credentials
- Container scanning: Analyze Docker images

4. Deploy Stage

deploy:staging:
  stage: deploy
  environment:
    name: staging
    url: https://staging.example.com
  script:
    - ./deploy.sh staging
  rules:
    - if: $CI_COMMIT_BRANCH == "develop"

deploy:production:
  stage: deploy
  environment:
    name: production
    url: https://example.com
  script:
    - ./deploy.sh production
  rules:
    - if: $CI_COMMIT_BRANCH == "main"
      when: manual

Deployment Strategies

Blue-Green Deployment

Maintain two identical environments:

deploy:blue-green:
  script:
    - ./deploy-to-inactive.sh
    - ./run-smoke-tests.sh
    - ./switch-traffic.sh
    - ./cleanup-old-environment.sh

Benefits:
- Zero-downtime deployments
- Easy rollback by switching traffic back
- Full testing in production-like environment

Canary Deployment

Gradually roll out to subset of users:

deploy:canary:
  script:
    - ./deploy-canary.sh --percentage=5
    - ./monitor-metrics.sh --duration=30m
    - ./deploy-canary.sh --percentage=25
    - ./monitor-metrics.sh --duration=30m
    - ./deploy-canary.sh --percentage=100

Canary stages:
1. Deploy to 5% of traffic
2. Monitor error rates and latency
3. Gradually increase if metrics are healthy
4. Full rollout or rollback based on data

Rolling Deployment

Update instances incrementally:

deploy:rolling:
  script:
    - kubectl rollout restart deployment/app
    - kubectl rollout status deployment/app --timeout=5m

Configuration:
- Set maxUnavailable and maxSurge
- Health checks determine rollout pace
- Automatic rollback on failure

Feature Flags

Decouple deployment from release:

// Feature flag implementation
if (featureFlags.isEnabled('new-checkout')) {
  return <NewCheckout />;
} else {
  return <LegacyCheckout />;
}

Benefits:
- Deploy disabled features to production
- Gradual feature rollout
- A/B testing capabilities
- Quick feature disable without deployment

Environment Management

Environment Hierarchy

Development -> Testing -> Staging -> Production

Each environment should:
- Mirror production as closely as possible
- Have isolated data and secrets
- Use infrastructure as code

Environment Variables

variables:
  # Global variables
  APP_NAME: my-app

# Environment-specific
.staging:
  variables:
    ENV: staging
    API_URL: https://api.staging.example.com

.production:
  variables:
    ENV: production
    API_URL: https://api.example.com

Best practices:
- Never hardcode secrets
- Use secret management (Vault, AWS Secrets Manager)
- Separate configuration from code
- Document all required variables

Infrastructure as Code

# Terraform example
resource "aws_ecs_service" "app" {
  name            = var.app_name
  cluster         = aws_ecs_cluster.main.id
  task_definition = aws_ecs_task_definition.app.arn
  desired_count   = var.environment == "production" ? 3 : 1

  deployment_configuration {
    maximum_percent         = 200
    minimum_healthy_percent = 100
  }
}

Testing Strategies

Test Pyramid

        /\
       /  \      E2E Tests (Few)
      /----\
     /      \    Integration Tests (Some)
    /--------\
   /          \  Unit Tests (Many)
  --------------

Test Parallelization

test:
  parallel: 4
  script:
    - npm test -- --shard=$CI_NODE_INDEX/$CI_NODE_TOTAL

Test Data Management

Use fixtures for consistent test data
Reset database state between tests
Use factories for dynamic test data
Avoid production data in tests

Flaky Test Handling

test:
  retry:
    max: 2
    when:
      - runner_system_failure
      - stuck_or_timeout_failure

Strategies:
- Quarantine flaky tests
- Add retry logic for known issues
- Investigate and fix root causes
- Track flaky test metrics

Monitoring and Observability

Pipeline Metrics

Track these metrics:
- Lead time: Commit to production duration
- Deployment frequency: How often you deploy
- Change failure rate: Percentage of failed deployments
- Mean time to recovery: Time to fix failures

Health Checks

deploy:
  script:
    - ./deploy.sh
    - ./wait-for-healthy.sh --timeout=300
    - ./run-smoke-tests.sh

Implement:
- Readiness probes
- Liveness probes
- Startup probes
- Smoke tests post-deployment

Alerting

notify:failure:
  stage: notify
  script:
    - ./send-alert.sh --channel=deployments --status=failed
  when: on_failure

notify:success:
  stage: notify
  script:
    - ./send-notification.sh --channel=deployments --status=success
  when: on_success

Security in CI/CD

Secrets Management

# Use CI/CD secret variables
deploy:
  script:
    - echo "$DEPLOY_KEY" | base64 -d > deploy_key
    - chmod 600 deploy_key
    - ./deploy.sh
  after_script:
    - rm -f deploy_key

Best practices:
- Rotate secrets regularly
- Use short-lived credentials
- Audit secret access
- Never log secrets

Pipeline Security

# Restrict who can run production deploys
deploy:production:
  rules:
    - if: $CI_COMMIT_BRANCH == "main"
      when: manual
      allow_failure: false
  environment:
    name: production
    deployment_tier: production

Controls:
- Branch protection rules
- Required approvals
- Audit logging
- Signed commits

Dependency Security

dependency_check:
  script:
    - npm audit --audit-level=high
    - ./check-licenses.sh
  allow_failure: false

Optimization Techniques

Caching

cache:
  key:
    files:
      - package-lock.json
  paths:
    - node_modules/
  policy: pull-push

Cache strategies:
- Cache dependencies between runs
- Use content-based cache keys
- Separate cache per branch
- Clean stale caches periodically

Parallelization

stages:
  - build
  - test
  - deploy

# Run tests in parallel
test:unit:
  stage: test
  script: npm run test:unit

test:integration:
  stage: test
  script: npm run test:integration

test:e2e:
  stage: test
  script: npm run test:e2e

Artifact Management

build:
  artifacts:
    paths:
      - dist/
    expire_in: 1 week
    when: on_success

Best practices:
- Set appropriate expiration
- Only store necessary artifacts
- Use artifact compression
- Clean up old artifacts

Rollback Strategies

Automatic Rollback

deploy:
  script:
    - ./deploy.sh
    - ./health-check.sh || ./rollback.sh

Manual Rollback

rollback:
  stage: deploy
  when: manual
  script:
    - ./get-previous-version.sh
    - ./deploy.sh --version=$PREVIOUS_VERSION

Database Rollbacks

Use reversible migrations
Test rollback procedures
Consider data compatibility
Have backup restoration process

Documentation

Pipeline Documentation

Document in your repository:
- Pipeline stages and their purpose
- Required environment variables
- Deployment procedures
- Troubleshooting guides
- Rollback procedures

Runbooks

Create runbooks for:
- Deployment failures
- Rollback procedures
- Environment setup
- Incident response

Continuous Improvement

Metrics to Track

Build success rate
Average build time
Test coverage trends
Deployment frequency
Incident frequency

Regular Reviews

Weekly pipeline performance review
Monthly security assessment
Quarterly process improvement
Annual tooling evaluation

# Supported AI Coding Agents

This skill is compatible with the SKILL.md standard and works with all major AI coding agents:

⚡ Amp 🚀 Antigravity 🤖 Claude Code 🦀 Clawdbot 📝 Codex ▶️ Cursor 🤖 Droid 💎 Gemini CLI 🐙 GitHub Copilot 🪿 Goose 📊 Kilo Code 🔧 Kiro CLI 💻 OpenCode 🦘 Roo Code 🌲 Trae 🏄 Windsurf

Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.