mindrally

ci-cd-best-practices

3
0
# Install this skill:
npx skills add Mindrally/skills --skill "ci-cd-best-practices"

Install specific skill from multi-skill repository

# Description

CI/CD best practices for building automated pipelines, deployment strategies, testing, and DevOps workflows across platforms

# SKILL.md


name: ci-cd-best-practices
description: CI/CD best practices for building automated pipelines, deployment strategies, testing, and DevOps workflows across platforms


CI/CD Best Practices

You are an expert in Continuous Integration and Continuous Deployment, following industry best practices for automated pipelines, testing strategies, deployment patterns, and DevOps workflows.

Core Principles

  • Automate everything that can be automated
  • Fail fast with quick feedback loops
  • Build once, deploy many times
  • Implement infrastructure as code
  • Practice continuous improvement
  • Maintain security at every stage

Pipeline Design

Pipeline Stages

A typical CI/CD pipeline includes these stages:

Build -> Test -> Security -> Deploy (Staging) -> Deploy (Production)

1. Build Stage

build:
  stage: build
  script:
    - npm ci --prefer-offline
    - npm run build
  artifacts:
    paths:
      - dist/
    expire_in: 1 day
  cache:
    key: ${CI_COMMIT_REF_SLUG}
    paths:
      - node_modules/

Best practices:
- Use dependency caching to speed up builds
- Generate build artifacts for downstream stages
- Pin dependency versions for reproducibility
- Use multi-stage Docker builds for smaller images

2. Test Stage

test:
  stage: test
  parallel:
    matrix:
      - TEST_TYPE: [unit, integration, e2e]
  script:
    - npm run test:${TEST_TYPE}
  coverage: '/Coverage: \d+\.\d+%/'
  artifacts:
    reports:
      junit: test-results.xml
      coverage_report:
        coverage_format: cobertura
        path: coverage/cobertura-coverage.xml

Testing layers:
- Unit tests: Fast, isolated, run on every commit
- Integration tests: Test component interactions
- End-to-end tests: Validate user workflows
- Performance tests: Check for regressions

3. Security Stage

security:
  stage: security
  parallel:
    matrix:
      - SCAN_TYPE: [sast, dependency, secrets]
  script:
    - ./security-scan.sh ${SCAN_TYPE}
  allow_failure: false

Security scanning types:
- SAST: Static Application Security Testing
- DAST: Dynamic Application Security Testing
- Dependency scanning: Check for vulnerable packages
- Secret detection: Find leaked credentials
- Container scanning: Analyze Docker images

4. Deploy Stage

deploy:staging:
  stage: deploy
  environment:
    name: staging
    url: https://staging.example.com
  script:
    - ./deploy.sh staging
  rules:
    - if: $CI_COMMIT_BRANCH == "develop"

deploy:production:
  stage: deploy
  environment:
    name: production
    url: https://example.com
  script:
    - ./deploy.sh production
  rules:
    - if: $CI_COMMIT_BRANCH == "main"
      when: manual

Deployment Strategies

Blue-Green Deployment

Maintain two identical environments:

deploy:blue-green:
  script:
    - ./deploy-to-inactive.sh
    - ./run-smoke-tests.sh
    - ./switch-traffic.sh
    - ./cleanup-old-environment.sh

Benefits:
- Zero-downtime deployments
- Easy rollback by switching traffic back
- Full testing in production-like environment

Canary Deployment

Gradually roll out to subset of users:

deploy:canary:
  script:
    - ./deploy-canary.sh --percentage=5
    - ./monitor-metrics.sh --duration=30m
    - ./deploy-canary.sh --percentage=25
    - ./monitor-metrics.sh --duration=30m
    - ./deploy-canary.sh --percentage=100

Canary stages:
1. Deploy to 5% of traffic
2. Monitor error rates and latency
3. Gradually increase if metrics are healthy
4. Full rollout or rollback based on data

Rolling Deployment

Update instances incrementally:

deploy:rolling:
  script:
    - kubectl rollout restart deployment/app
    - kubectl rollout status deployment/app --timeout=5m

Configuration:
- Set maxUnavailable and maxSurge
- Health checks determine rollout pace
- Automatic rollback on failure

Feature Flags

Decouple deployment from release:

// Feature flag implementation
if (featureFlags.isEnabled('new-checkout')) {
  return <NewCheckout />;
} else {
  return <LegacyCheckout />;
}

Benefits:
- Deploy disabled features to production
- Gradual feature rollout
- A/B testing capabilities
- Quick feature disable without deployment

Environment Management

Environment Hierarchy

Development -> Testing -> Staging -> Production

Each environment should:
- Mirror production as closely as possible
- Have isolated data and secrets
- Use infrastructure as code

Environment Variables

variables:
  # Global variables
  APP_NAME: my-app

# Environment-specific
.staging:
  variables:
    ENV: staging
    API_URL: https://api.staging.example.com

.production:
  variables:
    ENV: production
    API_URL: https://api.example.com

Best practices:
- Never hardcode secrets
- Use secret management (Vault, AWS Secrets Manager)
- Separate configuration from code
- Document all required variables

Infrastructure as Code

# Terraform example
resource "aws_ecs_service" "app" {
  name            = var.app_name
  cluster         = aws_ecs_cluster.main.id
  task_definition = aws_ecs_task_definition.app.arn
  desired_count   = var.environment == "production" ? 3 : 1

  deployment_configuration {
    maximum_percent         = 200
    minimum_healthy_percent = 100
  }
}

Testing Strategies

Test Pyramid

        /\
       /  \      E2E Tests (Few)
      /----\
     /      \    Integration Tests (Some)
    /--------\
   /          \  Unit Tests (Many)
  --------------

Test Parallelization

test:
  parallel: 4
  script:
    - npm test -- --shard=$CI_NODE_INDEX/$CI_NODE_TOTAL

Test Data Management

  • Use fixtures for consistent test data
  • Reset database state between tests
  • Use factories for dynamic test data
  • Avoid production data in tests

Flaky Test Handling

test:
  retry:
    max: 2
    when:
      - runner_system_failure
      - stuck_or_timeout_failure

Strategies:
- Quarantine flaky tests
- Add retry logic for known issues
- Investigate and fix root causes
- Track flaky test metrics

Monitoring and Observability

Pipeline Metrics

Track these metrics:
- Lead time: Commit to production duration
- Deployment frequency: How often you deploy
- Change failure rate: Percentage of failed deployments
- Mean time to recovery: Time to fix failures

Health Checks

deploy:
  script:
    - ./deploy.sh
    - ./wait-for-healthy.sh --timeout=300
    - ./run-smoke-tests.sh

Implement:
- Readiness probes
- Liveness probes
- Startup probes
- Smoke tests post-deployment

Alerting

notify:failure:
  stage: notify
  script:
    - ./send-alert.sh --channel=deployments --status=failed
  when: on_failure

notify:success:
  stage: notify
  script:
    - ./send-notification.sh --channel=deployments --status=success
  when: on_success

Security in CI/CD

Secrets Management

# Use CI/CD secret variables
deploy:
  script:
    - echo "$DEPLOY_KEY" | base64 -d > deploy_key
    - chmod 600 deploy_key
    - ./deploy.sh
  after_script:
    - rm -f deploy_key

Best practices:
- Rotate secrets regularly
- Use short-lived credentials
- Audit secret access
- Never log secrets

Pipeline Security

# Restrict who can run production deploys
deploy:production:
  rules:
    - if: $CI_COMMIT_BRANCH == "main"
      when: manual
      allow_failure: false
  environment:
    name: production
    deployment_tier: production

Controls:
- Branch protection rules
- Required approvals
- Audit logging
- Signed commits

Dependency Security

dependency_check:
  script:
    - npm audit --audit-level=high
    - ./check-licenses.sh
  allow_failure: false

Optimization Techniques

Caching

cache:
  key:
    files:
      - package-lock.json
  paths:
    - node_modules/
  policy: pull-push

Cache strategies:
- Cache dependencies between runs
- Use content-based cache keys
- Separate cache per branch
- Clean stale caches periodically

Parallelization

stages:
  - build
  - test
  - deploy

# Run tests in parallel
test:unit:
  stage: test
  script: npm run test:unit

test:integration:
  stage: test
  script: npm run test:integration

test:e2e:
  stage: test
  script: npm run test:e2e

Artifact Management

build:
  artifacts:
    paths:
      - dist/
    expire_in: 1 week
    when: on_success

Best practices:
- Set appropriate expiration
- Only store necessary artifacts
- Use artifact compression
- Clean up old artifacts

Rollback Strategies

Automatic Rollback

deploy:
  script:
    - ./deploy.sh
    - ./health-check.sh || ./rollback.sh

Manual Rollback

rollback:
  stage: deploy
  when: manual
  script:
    - ./get-previous-version.sh
    - ./deploy.sh --version=$PREVIOUS_VERSION

Database Rollbacks

  • Use reversible migrations
  • Test rollback procedures
  • Consider data compatibility
  • Have backup restoration process

Documentation

Pipeline Documentation

Document in your repository:
- Pipeline stages and their purpose
- Required environment variables
- Deployment procedures
- Troubleshooting guides
- Rollback procedures

Runbooks

Create runbooks for:
- Deployment failures
- Rollback procedures
- Environment setup
- Incident response

Continuous Improvement

Metrics to Track

  • Build success rate
  • Average build time
  • Test coverage trends
  • Deployment frequency
  • Incident frequency

Regular Reviews

  • Weekly pipeline performance review
  • Monthly security assessment
  • Quarterly process improvement
  • Annual tooling evaluation

# Supported AI Coding Agents

This skill is compatible with the SKILL.md standard and works with all major AI coding agents:

Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.