Implement GitOps workflows with ArgoCD and Flux for automated, declarative Kubernetes...
npx skills add Mindrally/skills --skill "ci-cd-best-practices"
Install specific skill from multi-skill repository
# Description
CI/CD best practices for building automated pipelines, deployment strategies, testing, and DevOps workflows across platforms
# SKILL.md
name: ci-cd-best-practices
description: CI/CD best practices for building automated pipelines, deployment strategies, testing, and DevOps workflows across platforms
CI/CD Best Practices
You are an expert in Continuous Integration and Continuous Deployment, following industry best practices for automated pipelines, testing strategies, deployment patterns, and DevOps workflows.
Core Principles
- Automate everything that can be automated
- Fail fast with quick feedback loops
- Build once, deploy many times
- Implement infrastructure as code
- Practice continuous improvement
- Maintain security at every stage
Pipeline Design
Pipeline Stages
A typical CI/CD pipeline includes these stages:
Build -> Test -> Security -> Deploy (Staging) -> Deploy (Production)
1. Build Stage
build:
stage: build
script:
- npm ci --prefer-offline
- npm run build
artifacts:
paths:
- dist/
expire_in: 1 day
cache:
key: ${CI_COMMIT_REF_SLUG}
paths:
- node_modules/
Best practices:
- Use dependency caching to speed up builds
- Generate build artifacts for downstream stages
- Pin dependency versions for reproducibility
- Use multi-stage Docker builds for smaller images
2. Test Stage
test:
stage: test
parallel:
matrix:
- TEST_TYPE: [unit, integration, e2e]
script:
- npm run test:${TEST_TYPE}
coverage: '/Coverage: \d+\.\d+%/'
artifacts:
reports:
junit: test-results.xml
coverage_report:
coverage_format: cobertura
path: coverage/cobertura-coverage.xml
Testing layers:
- Unit tests: Fast, isolated, run on every commit
- Integration tests: Test component interactions
- End-to-end tests: Validate user workflows
- Performance tests: Check for regressions
3. Security Stage
security:
stage: security
parallel:
matrix:
- SCAN_TYPE: [sast, dependency, secrets]
script:
- ./security-scan.sh ${SCAN_TYPE}
allow_failure: false
Security scanning types:
- SAST: Static Application Security Testing
- DAST: Dynamic Application Security Testing
- Dependency scanning: Check for vulnerable packages
- Secret detection: Find leaked credentials
- Container scanning: Analyze Docker images
4. Deploy Stage
deploy:staging:
stage: deploy
environment:
name: staging
url: https://staging.example.com
script:
- ./deploy.sh staging
rules:
- if: $CI_COMMIT_BRANCH == "develop"
deploy:production:
stage: deploy
environment:
name: production
url: https://example.com
script:
- ./deploy.sh production
rules:
- if: $CI_COMMIT_BRANCH == "main"
when: manual
Deployment Strategies
Blue-Green Deployment
Maintain two identical environments:
deploy:blue-green:
script:
- ./deploy-to-inactive.sh
- ./run-smoke-tests.sh
- ./switch-traffic.sh
- ./cleanup-old-environment.sh
Benefits:
- Zero-downtime deployments
- Easy rollback by switching traffic back
- Full testing in production-like environment
Canary Deployment
Gradually roll out to subset of users:
deploy:canary:
script:
- ./deploy-canary.sh --percentage=5
- ./monitor-metrics.sh --duration=30m
- ./deploy-canary.sh --percentage=25
- ./monitor-metrics.sh --duration=30m
- ./deploy-canary.sh --percentage=100
Canary stages:
1. Deploy to 5% of traffic
2. Monitor error rates and latency
3. Gradually increase if metrics are healthy
4. Full rollout or rollback based on data
Rolling Deployment
Update instances incrementally:
deploy:rolling:
script:
- kubectl rollout restart deployment/app
- kubectl rollout status deployment/app --timeout=5m
Configuration:
- Set maxUnavailable and maxSurge
- Health checks determine rollout pace
- Automatic rollback on failure
Feature Flags
Decouple deployment from release:
// Feature flag implementation
if (featureFlags.isEnabled('new-checkout')) {
return <NewCheckout />;
} else {
return <LegacyCheckout />;
}
Benefits:
- Deploy disabled features to production
- Gradual feature rollout
- A/B testing capabilities
- Quick feature disable without deployment
Environment Management
Environment Hierarchy
Development -> Testing -> Staging -> Production
Each environment should:
- Mirror production as closely as possible
- Have isolated data and secrets
- Use infrastructure as code
Environment Variables
variables:
# Global variables
APP_NAME: my-app
# Environment-specific
.staging:
variables:
ENV: staging
API_URL: https://api.staging.example.com
.production:
variables:
ENV: production
API_URL: https://api.example.com
Best practices:
- Never hardcode secrets
- Use secret management (Vault, AWS Secrets Manager)
- Separate configuration from code
- Document all required variables
Infrastructure as Code
# Terraform example
resource "aws_ecs_service" "app" {
name = var.app_name
cluster = aws_ecs_cluster.main.id
task_definition = aws_ecs_task_definition.app.arn
desired_count = var.environment == "production" ? 3 : 1
deployment_configuration {
maximum_percent = 200
minimum_healthy_percent = 100
}
}
Testing Strategies
Test Pyramid
/\
/ \ E2E Tests (Few)
/----\
/ \ Integration Tests (Some)
/--------\
/ \ Unit Tests (Many)
--------------
Test Parallelization
test:
parallel: 4
script:
- npm test -- --shard=$CI_NODE_INDEX/$CI_NODE_TOTAL
Test Data Management
- Use fixtures for consistent test data
- Reset database state between tests
- Use factories for dynamic test data
- Avoid production data in tests
Flaky Test Handling
test:
retry:
max: 2
when:
- runner_system_failure
- stuck_or_timeout_failure
Strategies:
- Quarantine flaky tests
- Add retry logic for known issues
- Investigate and fix root causes
- Track flaky test metrics
Monitoring and Observability
Pipeline Metrics
Track these metrics:
- Lead time: Commit to production duration
- Deployment frequency: How often you deploy
- Change failure rate: Percentage of failed deployments
- Mean time to recovery: Time to fix failures
Health Checks
deploy:
script:
- ./deploy.sh
- ./wait-for-healthy.sh --timeout=300
- ./run-smoke-tests.sh
Implement:
- Readiness probes
- Liveness probes
- Startup probes
- Smoke tests post-deployment
Alerting
notify:failure:
stage: notify
script:
- ./send-alert.sh --channel=deployments --status=failed
when: on_failure
notify:success:
stage: notify
script:
- ./send-notification.sh --channel=deployments --status=success
when: on_success
Security in CI/CD
Secrets Management
# Use CI/CD secret variables
deploy:
script:
- echo "$DEPLOY_KEY" | base64 -d > deploy_key
- chmod 600 deploy_key
- ./deploy.sh
after_script:
- rm -f deploy_key
Best practices:
- Rotate secrets regularly
- Use short-lived credentials
- Audit secret access
- Never log secrets
Pipeline Security
# Restrict who can run production deploys
deploy:production:
rules:
- if: $CI_COMMIT_BRANCH == "main"
when: manual
allow_failure: false
environment:
name: production
deployment_tier: production
Controls:
- Branch protection rules
- Required approvals
- Audit logging
- Signed commits
Dependency Security
dependency_check:
script:
- npm audit --audit-level=high
- ./check-licenses.sh
allow_failure: false
Optimization Techniques
Caching
cache:
key:
files:
- package-lock.json
paths:
- node_modules/
policy: pull-push
Cache strategies:
- Cache dependencies between runs
- Use content-based cache keys
- Separate cache per branch
- Clean stale caches periodically
Parallelization
stages:
- build
- test
- deploy
# Run tests in parallel
test:unit:
stage: test
script: npm run test:unit
test:integration:
stage: test
script: npm run test:integration
test:e2e:
stage: test
script: npm run test:e2e
Artifact Management
build:
artifacts:
paths:
- dist/
expire_in: 1 week
when: on_success
Best practices:
- Set appropriate expiration
- Only store necessary artifacts
- Use artifact compression
- Clean up old artifacts
Rollback Strategies
Automatic Rollback
deploy:
script:
- ./deploy.sh
- ./health-check.sh || ./rollback.sh
Manual Rollback
rollback:
stage: deploy
when: manual
script:
- ./get-previous-version.sh
- ./deploy.sh --version=$PREVIOUS_VERSION
Database Rollbacks
- Use reversible migrations
- Test rollback procedures
- Consider data compatibility
- Have backup restoration process
Documentation
Pipeline Documentation
Document in your repository:
- Pipeline stages and their purpose
- Required environment variables
- Deployment procedures
- Troubleshooting guides
- Rollback procedures
Runbooks
Create runbooks for:
- Deployment failures
- Rollback procedures
- Environment setup
- Incident response
Continuous Improvement
Metrics to Track
- Build success rate
- Average build time
- Test coverage trends
- Deployment frequency
- Incident frequency
Regular Reviews
- Weekly pipeline performance review
- Monthly security assessment
- Quarterly process improvement
- Annual tooling evaluation
# Supported AI Coding Agents
This skill is compatible with the SKILL.md standard and works with all major AI coding agents:
Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.