mindrally

observability-guidelines

3
0
# Install this skill:
npx skills add Mindrally/skills --skill "observability-guidelines"

Install specific skill from multi-skill repository

# Description

Observability guidelines for distributed systems using OpenTelemetry, tracing, metrics, and structured logging

# SKILL.md


name: observability-guidelines
description: Observability guidelines for distributed systems using OpenTelemetry, tracing, metrics, and structured logging


Observability Guidelines

Apply these observability principles to ensure comprehensive visibility into distributed systems and microservices.

Core Observability Principles

  • Guide the development of idiomatic, maintainable, and high-performance code with built-in observability
  • Enforce modular design and separation of concerns through Clean Architecture
  • Promote test-driven development and robust observability from the start

OpenTelemetry Integration

  • Use OpenTelemetry for distributed tracing, metrics, and structured logging
  • Start and propagate tracing spans across all service boundaries
  • Use otel.Tracer for creating spans and otel.Meter for collecting metrics
  • Export data to OpenTelemetry Collector, Jaeger, or Prometheus
  • Configure appropriate sampling rates for production environments

Distributed Tracing

  • Trace all incoming requests and propagate context through internal calls
  • Use middleware to instrument HTTP and gRPC endpoints automatically
  • Include trace context in all downstream service calls
  • Create child spans for significant operations within a service
  • Add relevant attributes to spans for debugging and analysis

Metrics Collection

Monitor these key metrics across all services:

  • Request latency: Track p50, p90, p95, and p99 percentiles
  • Throughput: Measure requests per second by endpoint
  • Error rate: Track 4xx and 5xx responses separately
  • Resource usage: Monitor CPU, memory, disk, and network utilization
  • Custom business metrics: Track domain-specific KPIs

Structured Logging

  • Include unique request IDs and trace context in all logs for correlation
  • Use structured logging formats (JSON) for machine parseability
  • Include relevant context: timestamp, service name, trace ID, span ID
  • Log at appropriate levels: DEBUG, INFO, WARN, ERROR
  • Avoid logging sensitive information (PII, credentials)

Architecture Patterns

  • Apply Clean Architecture with handlers, services, repositories, and domain models
  • Use domain-driven design principles for clear boundaries
  • Prioritize interface-driven development with explicit dependency injection
  • Prefer composition over inheritance; favor small, purpose-specific interfaces

Correlation and Context

  • Propagate context through the entire request lifecycle
  • Use correlation IDs for request tracking across services
  • Include service version and deployment information in telemetry
  • Tag traces with relevant business context for filtering
  • Enable trace-to-log and log-to-trace correlation

Alerting and Dashboards

  • Create dashboards for service health and business metrics
  • Set up alerts based on SLOs and error budgets
  • Use anomaly detection for proactive issue identification
  • Document runbooks for common alert scenarios
  • Review and tune alerts regularly to reduce noise

Instrumentation Best Practices

  • Instrument at service boundaries (entry/exit points)
  • Add custom spans for database operations and external calls
  • Include relevant attributes (user ID, request type, etc.)
  • Avoid over-instrumentation that creates noise
  • Use semantic conventions for consistent attribute naming

Production Considerations

  • Configure appropriate sampling rates to balance visibility and cost
  • Use head-based sampling for consistent trace capture
  • Implement tail-based sampling for capturing errors
  • Set retention policies based on debugging needs
  • Monitor observability infrastructure health

# Supported AI Coding Agents

This skill is compatible with the SKILL.md standard and works with all major AI coding agents:

Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.