Use when you have a written implementation plan to execute in a separate session with review checkpoints
npx skills add acardozzo/rx-suite --skill "ops-rx"
Install specific skill from multi-skill repository
# Description
>
# SKILL.md
name: ops-rx
description: >
Prescriptive operational and SRE maturity evaluation producing scored diagnostic maps.
Evaluates whether you can OPERATE a system reliably in production β beyond architecture
and code quality. Measures 8 dimensions (32 sub-metrics) against Google SRE, DORA,
FinOps, and AWS Well-Architected frameworks. Produces per-dimension scorecards with
actionable prescriptions and aggregate grades.
triggers:
- "run ops-rx"
- "SRE audit"
- "operational maturity"
- "production readiness"
- "ops review"
- "operations review"
- "SRE maturity"
- "production readiness review"
Prerequisites
None (POSIX only)
Check all dependencies: bash scripts/rx-deps.sh or bash scripts/rx-deps.sh --install
ops-rx β Operational & SRE Maturity Diagnostic
Purpose
Evaluate operational maturity across 8 dimensions with 32 sub-metrics. Produces
objective, repeatable grades and actionable prescriptions for reaching production
excellence.
Dimensions & Weights
| Dim | Name | Weight | Sub-metrics | Source |
|---|---|---|---|---|
| D1 | SLI/SLO/Error Budget | 15% | M1.1βM1.4 | Google SRE Book (ch. 4-5), SLO Workbook |
| D2 | Alerting Quality | 15% | M2.1βM2.4 | Google SRE Book (ch. 6), Ewaschuk philosophy |
| D3 | Incident Response | 15% | M3.1βM3.4 | PagerDuty IR Guide, Google SRE Book (ch. 14) |
| D4 | DORA Metrics | 10% | M4.1βM4.4 | Accelerate (Forsgren, Humble, Kim) |
| D5 | Runbook Coverage | 10% | M5.1βM5.4 | SRE Workbook, Runbook best practices |
| D6 | Capacity & Scaling | 10% | M6.1βM6.4 | AWS Well-Architected Reliability Pillar |
| D7 | Disaster Recovery | 10% | M7.1βM7.4 | AWS DR whitepaper, RTO/RPO patterns |
| D8 | Cost & Efficiency | 15% | M8.1βM8.4 | FinOps Foundation, AWS Cost Optimization |
Sub-metrics
D1: SLI/SLO/Error Budget (15%)
- M1.1: SLI definition β latency, availability, throughput, correctness defined
- M1.2: SLO targets β documented, measurable, stakeholder-agreed
- M1.3: Error budget tracking β budget calculated, burn rate alerts configured
- M1.4: SLO-based decision making β budget informs release velocity and toil prioritization
D2: Alerting Quality (15%)
- M2.1: Signal-to-noise ratio β alerts are actionable, not noisy
- M2.2: Alert severity levels β paging vs ticket vs info, proper routing
- M2.3: Alert documentation β every alert links to a runbook
- M2.4: Alert testing β alerts verified in staging, dead alert cleanup process
D3: Incident Response (15%)
- M3.1: Incident process β defined roles: IC, scribe, comms lead
- M3.2: On-call rotation β fair rotation, escalation paths, handoff process
- M3.3: Post-mortems β blameless, action items tracked, SLO impact noted
- M3.4: Communication templates β status page, stakeholder updates, customer comms
D4: DORA Metrics (10%)
- M4.1: Deployment frequency β how often code ships to production
- M4.2: Lead time for changes β commit to production duration
- M4.3: Change failure rate β % of deployments causing incidents
- M4.4: Mean time to recover β MTTR from incident detection to resolution
D5: Runbook Coverage (10%)
- M5.1: Runbook existence β every service and alert has a runbook
- M5.2: Runbook quality β steps are testable, not stale, include rollback
- M5.3: Automation level β runbook steps automated where possible
- M5.4: Runbook maintenance β review cadence, last-updated tracking
D6: Capacity & Scaling (10%)
- M6.1: Load testing β regular baseline testing, regression tracked
- M6.2: Auto-scaling configured β policies, min/max, cool-down tuned
- M6.3: Resource monitoring β CPU/memory/disk/connections tracked with thresholds
- M6.4: Capacity planning β growth projections, headroom policy documented
D7: Disaster Recovery (10%)
- M7.1: Backup strategy β automated, tested, offsite, encrypted
- M7.2: Recovery testing β DR drills executed, RTO/RPO verified
- M7.3: Multi-region readiness β failover configured, data replication active
- M7.4: Business continuity β degraded mode definitions, priority services identified
D8: Cost & Efficiency (15%)
- M8.1: Resource tagging β all resources tagged, cost allocation enabled
- M8.2: Right-sizing β instance sizing matches load, spot/preemptible usage
- M8.3: Budget alerts β cost anomaly detection, threshold alerts configured
- M8.4: Waste elimination β idle resources, unused storage, over-provisioned instances
Workflow
- Discover β Run
scripts/discover.shto scan the repo for ops artifacts - Grade β Apply thresholds from
references/grading-framework.mdto each sub-metric - Report β Use
references/output-templates.mdto produce the scorecard - Prescribe β For each sub-metric scoring below A, prescribe concrete next steps
Grading Scale
| Grade | Range | Meaning |
|---|---|---|
| A+ | 95β100 | Elite β exemplary operational maturity |
| A | 85β94 | Strong β production-ready with minor gaps |
| B | 70β84 | Adequate β functional but improvement needed |
| C | 50β69 | Weak β significant operational risk |
| D | 30β49 | Poor β not production-ready |
| F | 0β29 | Failing β critical operational gaps |
Output
The skill produces:
1. Per-dimension scorecard with letter grade and numeric score
2. Aggregate weighted score and overall grade
3. Top-5 priority prescriptions ranked by risk Γ effort
4. Mermaid radar chart of all 8 dimensions
Auto-Plan Integration
After generating the scorecard and saving the report to docs/audits/:
1. Save a copy of the report to docs/rx-plans/{this-skill-name}/{date}-report.md
2. For each dimension scoring below 97, invoke the rx-plan skill to create or update the improvement plan at docs/rx-plans/{this-skill-name}/{dimension}/v{N}-{date}-plan.md
3. Update docs/rx-plans/{this-skill-name}/summary.md with current scores
4. Update docs/rx-plans/dashboard.md with overall progress
This happens automatically β the user does not need to run /rx-plan separately.
# Supported AI Coding Agents
This skill is compatible with the SKILL.md standard and works with all major AI coding agents:
Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.