mitigation

by @miles-knowbl in Development

# Install this skill:

npx skills add miles-knowbl/orchestrator --skill "mitigation"

Install specific skill from multi-skill repository

# Description

Execute immediate mitigation for production incidents. Decide between rollback, hotfix, feature flag, or traffic management. Prioritizes restoring service over finding root cause.

# SKILL.md

name: mitigation
description: "Execute immediate mitigation for production incidents. Decide between rollback, hotfix, feature flag, or traffic management. Prioritizes restoring service over finding root cause."
phase: IMPLEMENT
category: core
version: "1.0.0"
depends_on: [incident-triage]
tags: [incident, mitigation, hotfix, rollback]

Mitigation

Execute immediate mitigation to restore production service. This skill is about speed and correctness of response, not about understanding why the problem happened. Restore service first, investigate later. Choose the simplest, safest action that stops the bleeding.

When to Use

After incident triage has classified severity and identified blast radius
When production service is degraded or down and users are impacted
When a deployment has been identified as the likely cause
When a feature needs to be disabled quickly to stop errors

Decision Framework

Select the mitigation strategy based on the situation:

Situation	Strategy	When to Use
Recent deploy caused the issue	Rollback	Deploy is <24h old, previous version was stable
A specific feature is broken	Feature Flag	Feature flag exists, issue is isolated to one feature
Targeted code fix is obvious and small	Hotfix	Fix is <20 lines, root cause is clear, can ship in <30 min
Traffic or capacity issue	Traffic Management	Scale up, rate limit, shed load, redirect traffic
External dependency is down	Circuit Breaker	Add fallback behavior, graceful degradation

Rule of thumb: If you can rollback, rollback. It is almost always faster and safer than a hotfix.

Process

Review triage findings - Read the INCIDENT-TRIAGE.md. Understand severity, blast radius, likely cause, and the recommended response strategy.
Select mitigation strategy - Using the decision framework above, confirm or adjust the strategy recommended during triage. Document why you chose this approach.
Implement the minimum viable fix - Execute the selected strategy. Do the least amount of work needed to restore service. Resist the urge to fix adjacent issues or refactor during an incident.
Verify mitigation works - Confirm that the mitigation resolved the immediate problem: error rates dropping, latency returning to normal, users able to complete workflows. Check both metrics and manual verification.
Monitor for stability - Watch the system for at least 15 minutes after mitigation. Confirm that the fix holds and no new issues emerge. Set up alerts for any regression.
Document what was done - Record the exact actions taken, timestamps, and results. This feeds into the postmortem.

Deliverables

Deliverable	Format	Purpose
Mitigation actions applied	Code/Config changes	Restore service
Incident timeline update	Appended to INCIDENT-TRIAGE.md	Record of actions taken

Incident Timeline Entry

For each mitigation action, record:
- Timestamp: When the action was taken
- Action: What was done (rollback to v1.2.3, disabled feature flag X, scaled pods to 10)
- Result: What changed (error rate dropped from 15% to 0.1%, latency p99 returned to 200ms)
- Verified by: How success was confirmed (dashboard link, manual test, health check)

Quality Criteria

Service is restored: the user-facing impact is resolved
Mitigation is minimal: no over-fixing, no scope creep during the incident
Monitoring confirms stability for at least 15 minutes post-mitigation
Every action is timestamped and documented in the incident timeline
Mitigation does not introduce new risks or break other features
If rollback was available and not used, document the reason why

# Supported AI Coding Agents

This skill is compatible with the SKILL.md standard and works with all major AI coding agents:

⚡ Amp 🚀 Antigravity 🤖 Claude Code 🦀 Clawdbot 📝 Codex ▶️ Cursor 🤖 Droid 💎 Gemini CLI 🐙 GitHub Copilot 🪿 Goose 📊 Kilo Code 🔧 Kiro CLI 💻 OpenCode 🦘 Roo Code 🌲 Trae 🏄 Windsurf

Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.