netsapiensis

rocky-opentelemetry

0
0
# Install this skill:
npx skills add netsapiensis/claude-code-skills --skill "rocky-opentelemetry"

Install specific skill from multi-skill repository

# Description

OpenTelemetry Collector administration on Rocky Linux 8/9 including otelcol-contrib installation, configuration of receivers, processors, exporters, and pipelines for metrics, traces, and logs. Covers OTLP, Prometheus, filelog, hostmetrics, journald receivers and OpenSearch, OTLP, Prometheus exporters. Use when setting up observability pipelines or configuring the OTel collector.

# SKILL.md


name: rocky-opentelemetry
description: OpenTelemetry Collector administration on Rocky Linux 8/9 including otelcol-contrib installation, configuration of receivers, processors, exporters, and pipelines for metrics, traces, and logs. Covers OTLP, Prometheus, filelog, hostmetrics, journald receivers and OpenSearch, OTLP, Prometheus exporters. Use when setting up observability pipelines or configuring the OTel collector.


OpenTelemetry Collector Administration

Installation, configuration, receivers, processors, exporters, and pipeline patterns for the OpenTelemetry Collector on Rocky Linux 8/9.

Prerequisite: See rocky-foundation for OS detection and safety tier definitions.

Installation

RPM Install (otelcol-contrib)

# Download and install otelcol-contrib RPM  # [CONFIRM]
# Check latest version at: https://github.com/open-telemetry/opentelemetry-collector-releases/releases
OTEL_VERSION="0.96.0"  # Update to latest

# Download RPM  # [CONFIRM]
curl -Lo /tmp/otelcol-contrib.rpm \
  "https://github.com/open-telemetry/opentelemetry-collector-releases/releases/download/v${OTEL_VERSION}/otelcol-contrib_${OTEL_VERSION}_linux_amd64.rpm"

# Install  # [CONFIRM]
dnf install -y /tmp/otelcol-contrib.rpm

# Verify installation  # [READ-ONLY]
otelcol-contrib --version

Systemd Service Setup

The RPM typically creates a systemd service. If not:

# /etc/systemd/system/otelcol-contrib.service  # [CONFIRM]
[Unit]
Description=OpenTelemetry Collector Contrib
After=network-online.target
Wants=network-online.target

[Service]
Type=simple
User=otelcol
Group=otelcol
ExecStart=/usr/bin/otelcol-contrib --config=/etc/otelcol-contrib/config.yaml
ExecReload=/bin/kill -HUP $MAINPID
Restart=on-failure
RestartSec=5
MemoryLimit=512M

# Hardening
NoNewPrivileges=yes
ProtectSystem=strict
ProtectHome=yes
ReadWritePaths=/var/log/otelcol /var/lib/otelcol
PrivateTmp=yes

[Install]
WantedBy=multi-user.target
# Create user  # [CONFIRM]
useradd -r -s /sbin/nologin -d /var/lib/otelcol otelcol

# Create directories  # [CONFIRM]
mkdir -p /etc/otelcol-contrib /var/log/otelcol /var/lib/otelcol
chown otelcol:otelcol /var/log/otelcol /var/lib/otelcol

# Enable and start  # [CONFIRM]
systemctl daemon-reload
systemctl enable --now otelcol-contrib

# Check status  # [READ-ONLY]
systemctl status otelcol-contrib
journalctl -u otelcol-contrib --since "5 min ago"

Validate Configuration

# Validate config before applying  # [READ-ONLY]
otelcol-contrib validate --config=/etc/otelcol-contrib/config.yaml

# Reload config  # [CONFIRM]
systemctl reload otelcol-contrib
# Or: kill -HUP $(pgrep otelcol)

Configuration Structure

The OTel Collector config has four main sections:

# /etc/otelcol-contrib/config.yaml
receivers:    # How data gets in
processors:   # How data is transformed
exporters:    # Where data goes out
service:      # Ties receivers -> processors -> exporters into pipelines

Minimal Working Configuration

# /etc/otelcol-contrib/config.yaml  # [CONFIRM]
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

processors:
  memory_limiter:
    check_interval: 5s
    limit_mib: 400
    spike_limit_mib: 100
  batch:
    timeout: 5s
    send_batch_size: 1000

exporters:
  debug:
    verbosity: basic

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [memory_limiter, batch]
      exporters: [debug]
    metrics:
      receivers: [otlp]
      processors: [memory_limiter, batch]
      exporters: [debug]
    logs:
      receivers: [otlp]
      processors: [memory_limiter, batch]
      exporters: [debug]

Receivers

OTLP Receiver

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
        max_recv_msg_size_mib: 4
      http:
        endpoint: 0.0.0.0:4318
        cors:
          allowed_origins:
            - "https://example.com"

Prometheus Receiver (Scrape Targets)

receivers:
  prometheus:
    config:
      scrape_configs:
        - job_name: 'node-exporter'
          scrape_interval: 15s
          static_configs:
            - targets: ['localhost:9100']
              labels:
                env: production

        - job_name: 'opensearch'
          scrape_interval: 30s
          metrics_path: /_prometheus/metrics
          static_configs:
            - targets: ['localhost:9200']
          basic_auth:
            username: admin
            password: admin
          tls_config:
            insecure_skip_verify: true

Filelog Receiver (Log Files)

receivers:
  filelog:
    include:
      - /var/log/nginx/access.log
      - /var/log/nginx/error.log
      - /var/log/myapp/*.log
    exclude:
      - /var/log/nginx/*.gz
    start_at: end                  # 'beginning' or 'end'
    include_file_name: true
    include_file_path: true
    operators:
      - type: regex_parser
        regex: '^(?P<remote_addr>\S+) - (?P<remote_user>\S+) \[(?P<time_local>[^\]]+)\] "(?P<request>[^"]+)" (?P<status>\d+) (?P<body_bytes_sent>\d+)'
        timestamp:
          parse_from: attributes.time_local
          layout: '02/Jan/2006:15:04:05 -0700'
        severity:
          parse_from: attributes.status
          mapping:
            error: ['500', '502', '503', '504']
            warn: ['400', '401', '403', '404']
            info: ['200', '201', '301', '302']

  filelog/json:
    include:
      - /var/log/myapp/app.json.log
    start_at: end
    operators:
      - type: json_parser
        timestamp:
          parse_from: attributes.timestamp
          layout: '%Y-%m-%dT%H:%M:%S.%LZ'

Host Metrics Receiver

receivers:
  hostmetrics:
    collection_interval: 30s
    scrapers:
      cpu:
        metrics:
          system.cpu.utilization:
            enabled: true
      memory:
        metrics:
          system.memory.utilization:
            enabled: true
      disk: {}
      filesystem: {}
      load: {}
      network: {}
      processes: {}
      process:
        include:
          match_type: regexp
          names:
            - opensearch
            - nginx
            - php-fpm

Journald Receiver

receivers:
  journald:
    directory: /var/log/journal
    units:
      - sshd
      - nginx
      - opensearch
      - otelcol-contrib
    priority: info                  # Minimum priority

Note: The otelcol user needs access to the journal:

# Add otelcol to systemd-journal group  # [CONFIRM]
usermod -aG systemd-journal otelcol
systemctl restart otelcol-contrib

Processors

Mandatory Processors

Always include memory_limiter and batch in every pipeline.

processors:
  # MUST be first in pipeline -- prevents OOM
  memory_limiter:
    check_interval: 5s
    limit_mib: 400           # Hard limit
    spike_limit_mib: 100     # Soft limit (starts dropping data)

  # SHOULD be last before exporters -- improves throughput
  batch:
    timeout: 5s
    send_batch_size: 1000
    send_batch_max_size: 1500

Filter Processor

processors:
  filter/drop-health:
    error_mode: ignore
    traces:
      span:
        - 'attributes["http.route"] == "/health"'
        - 'attributes["http.route"] == "/ready"'
    metrics:
      metric:
        - 'name == "system.cpu.time" and resource.attributes["host.name"] == "test-host"'
    logs:
      log_record:
        - 'severity_number < 9'    # Drop below WARN

Transform Processor

processors:
  transform:
    error_mode: ignore
    log_statements:
      - context: log
        statements:
          - set(resource.attributes["service.name"], "myapp") where resource.attributes["service.name"] == nil
          - set(attributes["env"], "production")
    trace_statements:
      - context: span
        statements:
          - set(attributes["deployment.environment"], "production")
    metric_statements:
      - context: datapoint
        statements:
          - set(attributes["env"], "production")

Resource Processor

processors:
  resource:
    attributes:
      - key: host.name
        from_attribute: host.name
        action: upsert
      - key: environment
        value: production
        action: insert
      - key: service.namespace
        value: myteam
        action: insert

Attributes Processor

processors:
  attributes/remove-sensitive:
    actions:
      - key: http.request.header.authorization
        action: delete
      - key: db.statement
        action: hash     # Hash sensitive values
      - key: user.email
        action: delete

Exporters

OpenSearch Exporter

exporters:
  opensearch:
    http:
      endpoint: https://localhost:9200
      tls:
        insecure_skip_verify: true    # For self-signed certs
      auth:
        authenticator: basicauth/opensearch
    logs_index: otel-logs
    traces_index: otel-traces
    # Retry and queue
    retry_on_failure:
      enabled: true
      initial_interval: 5s
      max_interval: 30s
      max_elapsed_time: 300s
    sending_queue:
      enabled: true
      num_consumers: 10
      queue_size: 5000

extensions:
  basicauth/opensearch:
    client_auth:
      username: admin
      password: admin

OTLP Exporter (to Another Collector)

exporters:
  otlp:
    endpoint: gateway-collector:4317
    tls:
      insecure: false
      ca_file: /etc/otelcol-contrib/certs/ca.pem
    retry_on_failure:
      enabled: true
    sending_queue:
      enabled: true
      queue_size: 5000

  otlphttp:
    endpoint: https://gateway-collector:4318
    tls:
      ca_file: /etc/otelcol-contrib/certs/ca.pem

Prometheus Exporter (Expose Metrics)

exporters:
  prometheus:
    endpoint: 0.0.0.0:8889
    namespace: otel
    resource_to_telemetry_conversion:
      enabled: true

Debug Exporter (Development)

exporters:
  debug:
    verbosity: detailed    # basic, normal, detailed
    sampling_initial: 5
    sampling_thereafter: 200

Pipeline Patterns

Full Production Pipeline

# /etc/otelcol-contrib/config.yaml  # [CONFIRM]
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

  hostmetrics:
    collection_interval: 30s
    scrapers:
      cpu: {}
      memory: {}
      disk: {}
      filesystem: {}
      load: {}
      network: {}

  filelog/nginx:
    include:
      - /var/log/nginx/access.log
      - /var/log/nginx/error.log
    start_at: end
    include_file_name: true

  journald:
    units:
      - sshd
      - nginx
      - opensearch

processors:
  memory_limiter:
    check_interval: 5s
    limit_mib: 400
    spike_limit_mib: 100

  batch:
    timeout: 5s
    send_batch_size: 1000

  resource:
    attributes:
      - key: host.name
        from_attribute: host.name
        action: upsert
      - key: environment
        value: production
        action: insert

  filter/drop-health:
    error_mode: ignore
    traces:
      span:
        - 'attributes["http.route"] == "/health"'

exporters:
  opensearch:
    http:
      endpoint: https://opensearch:9200
      tls:
        insecure_skip_verify: true
      auth:
        authenticator: basicauth/opensearch
    logs_index: otel-logs
    traces_index: otel-traces

  prometheus:
    endpoint: 0.0.0.0:8889

extensions:
  basicauth/opensearch:
    client_auth:
      username: otel
      password: otel_password

  health_check:
    endpoint: 0.0.0.0:13133

  zpages:
    endpoint: 0.0.0.0:55679

service:
  extensions: [basicauth/opensearch, health_check, zpages]
  pipelines:
    traces:
      receivers: [otlp]
      processors: [memory_limiter, filter/drop-health, resource, batch]
      exporters: [opensearch]
    metrics:
      receivers: [otlp, hostmetrics]
      processors: [memory_limiter, resource, batch]
      exporters: [prometheus]
    logs:
      receivers: [otlp, filelog/nginx, journald]
      processors: [memory_limiter, resource, batch]
      exporters: [opensearch]

Gateway Pattern (Fan-In from Edge Collectors)

# Gateway collector receives from edge collectors via OTLP  # [CONFIRM]
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317

processors:
  memory_limiter:
    check_interval: 5s
    limit_mib: 2048
    spike_limit_mib: 512

  batch:
    timeout: 10s
    send_batch_size: 5000

exporters:
  opensearch:
    http:
      endpoint: https://opensearch-cluster:9200
      tls:
        insecure_skip_verify: true
      auth:
        authenticator: basicauth/opensearch

extensions:
  basicauth/opensearch:
    client_auth:
      username: otel
      password: otel_password

service:
  extensions: [basicauth/opensearch]
  pipelines:
    traces:
      receivers: [otlp]
      processors: [memory_limiter, batch]
      exporters: [opensearch]
    metrics:
      receivers: [otlp]
      processors: [memory_limiter, batch]
      exporters: [opensearch]
    logs:
      receivers: [otlp]
      processors: [memory_limiter, batch]
      exporters: [opensearch]

Firewall and SELinux

# Open OTLP ports  # [CONFIRM]
firewall-cmd --add-port=4317/tcp --permanent    # gRPC
firewall-cmd --add-port=4318/tcp --permanent    # HTTP
firewall-cmd --add-port=8889/tcp --permanent    # Prometheus exporter
firewall-cmd --add-port=13133/tcp --permanent   # Health check
firewall-cmd --reload

# SELinux: allow otelcol to read log files  # [CONFIRM]
setsebool -P daemons_read_all_files_and_dirs on
# Or more targeted:
semanage fcontext -a -t var_log_t "/var/log/otelcol(/.*)?"
restorecon -Rv /var/log/otelcol/

Monitoring the Collector

# Health check  # [READ-ONLY]
curl -s http://localhost:13133/

# zPages (internal debug)  # [READ-ONLY]
curl -s http://localhost:55679/debug/tracez
curl -s http://localhost:55679/debug/pipelinez

# Collector metrics  # [READ-ONLY]
curl -s http://localhost:8888/metrics     # Internal telemetry

# Logs  # [READ-ONLY]
journalctl -u otelcol-contrib -f
journalctl -u otelcol-contrib --since "5 min ago"

Checklist: OTel Collector Deployment

  • [ ] Install otelcol-contrib RPM
  • [ ] Create otelcol user and directories
  • [ ] Write and validate config (otelcol-contrib validate)
  • [ ] memory_limiter is first processor in every pipeline
  • [ ] batch is last processor before exporters
  • [ ] SELinux allows log file access
  • [ ] Firewall ports open for receivers
  • [ ] Systemd service enabled
  • [ ] Health check endpoint accessible
  • [ ] Test data flowing through pipeline

When to Use This Skill

  • Installing the OpenTelemetry Collector
  • Configuring data pipelines (logs, metrics, traces)
  • Setting up receivers for various data sources
  • Configuring exporters to OpenSearch or other backends
  • Tuning collector performance and memory limits
  • Troubleshooting data flow issues
  • rocky-foundation -- OS detection, safety tiers
  • rocky-core-system -- systemd service management
  • rocky-opensearch -- OpenSearch as export destination
  • rocky-selinux -- SELinux contexts for log access
  • rocky-networking -- Firewall rules for collector ports
  • rocky-webstack -- Nginx/PHP log collection

# Supported AI Coding Agents

This skill is compatible with the SKILL.md standard and works with all major AI coding agents:

Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.