observability-guidelines

by @mindrally in Development

# Install this skill:

npx skills add Mindrally/skills --skill "observability-guidelines"

Install specific skill from multi-skill repository

# Description

Observability guidelines for distributed systems using OpenTelemetry, tracing, metrics, and structured logging

# SKILL.md

name: observability-guidelines
description: Observability guidelines for distributed systems using OpenTelemetry, tracing, metrics, and structured logging

Observability Guidelines

Apply these observability principles to ensure comprehensive visibility into distributed systems and microservices.

Core Observability Principles

Guide the development of idiomatic, maintainable, and high-performance code with built-in observability
Enforce modular design and separation of concerns through Clean Architecture
Promote test-driven development and robust observability from the start

OpenTelemetry Integration

Use OpenTelemetry for distributed tracing, metrics, and structured logging
Start and propagate tracing spans across all service boundaries
Use otel.Tracer for creating spans and otel.Meter for collecting metrics
Export data to OpenTelemetry Collector, Jaeger, or Prometheus
Configure appropriate sampling rates for production environments

Distributed Tracing

Trace all incoming requests and propagate context through internal calls
Use middleware to instrument HTTP and gRPC endpoints automatically
Include trace context in all downstream service calls
Create child spans for significant operations within a service
Add relevant attributes to spans for debugging and analysis

Metrics Collection

Monitor these key metrics across all services:

Request latency: Track p50, p90, p95, and p99 percentiles
Throughput: Measure requests per second by endpoint
Error rate: Track 4xx and 5xx responses separately
Resource usage: Monitor CPU, memory, disk, and network utilization
Custom business metrics: Track domain-specific KPIs

Structured Logging

Include unique request IDs and trace context in all logs for correlation
Use structured logging formats (JSON) for machine parseability
Include relevant context: timestamp, service name, trace ID, span ID
Log at appropriate levels: DEBUG, INFO, WARN, ERROR
Avoid logging sensitive information (PII, credentials)

Architecture Patterns

Apply Clean Architecture with handlers, services, repositories, and domain models
Use domain-driven design principles for clear boundaries
Prioritize interface-driven development with explicit dependency injection
Prefer composition over inheritance; favor small, purpose-specific interfaces

Correlation and Context

Propagate context through the entire request lifecycle
Use correlation IDs for request tracking across services
Include service version and deployment information in telemetry
Tag traces with relevant business context for filtering
Enable trace-to-log and log-to-trace correlation

Alerting and Dashboards

Create dashboards for service health and business metrics
Set up alerts based on SLOs and error budgets
Use anomaly detection for proactive issue identification
Document runbooks for common alert scenarios
Review and tune alerts regularly to reduce noise

Instrumentation Best Practices

Instrument at service boundaries (entry/exit points)
Add custom spans for database operations and external calls
Include relevant attributes (user ID, request type, etc.)
Avoid over-instrumentation that creates noise
Use semantic conventions for consistent attribute naming

Production Considerations

Configure appropriate sampling rates to balance visibility and cost
Use head-based sampling for consistent trace capture
Implement tail-based sampling for capturing errors
Set retention policies based on debugging needs
Monitor observability infrastructure health

# Supported AI Coding Agents

This skill is compatible with the SKILL.md standard and works with all major AI coding agents:

⚡ Amp 🚀 Antigravity 🤖 Claude Code 🦀 Clawdbot 📝 Codex ▶️ Cursor 🤖 Droid 💎 Gemini CLI 🐙 GitHub Copilot 🪿 Goose 📊 Kilo Code 🔧 Kiro CLI 💻 OpenCode 🦘 Roo Code 🌲 Trae 🏄 Windsurf

Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.