xxxryan

incident-triage

1
0
# Install this skill:
npx skills add xxxryan/agent-skills --skill "incident-triage"

Install specific skill from multi-skill repository

# Description

Triage a production incident from symptoms, logs, metrics, and recent deploy info. Produces a hypothesis tree, the fastest checks, likely root cause, and safe mitigations. Use when the user reports outages, latency spikes, error rate increases, stuck jobs, or resource exhaustion.

# SKILL.md


name: incident-triage
description: Triage a production incident from symptoms, logs, metrics, and recent deploy info. Produces a hypothesis tree, the fastest checks, likely root cause, and safe mitigations. Use when the user reports outages, latency spikes, error rate increases, stuck jobs, or resource exhaustion.
compatibility: Works with pasted logs/graphs summaries; best if the agent can inspect repo history and config changes.
metadata:
short-description: Production incident triage playbook
allowed-tools: Read


Incident Triage

What you need (ask only if missing)

  • What changed recently (deploy, config, dependency)
  • Primary symptoms (error codes, latency, saturation, partial outage)
  • Scope (one region, one service, all traffic)
  • Example log lines / request IDs

Workflow

  1. Restate symptoms as facts
  2. Build a hypothesis tree
  3. Deploy regression
  4. Downstream dependency failure
  5. Resource saturation (CPU/mem/conn pool)
  6. Data issue (bad row, migration)
  7. Thundering herd / retry storms
  8. Fast checks (order by speed + confidence)
  9. Mitigation options (safe first)
  10. rollback, scale out, disable feature flag, reduce concurrency, tighten retries
  11. Permanent fix outline + prevention
  12. tests, alerts, rate limits, circuit breaker, dashboards

Output format

Situation summary

...

Hypotheses (ranked)

  1. ...
  2. ...

Fast checks

  • ...

Mitigations (safe first)

  • ...

Likely root cause

...

Follow-up actions

  • ...

# Supported AI Coding Agents

This skill is compatible with the SKILL.md standard and works with all major AI coding agents:

Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.