Use when you have a written implementation plan to execute in a separate session with review checkpoints
npx skills add netsapiensis/claude-code-skills --skill "rocky-opentelemetry"
Install specific skill from multi-skill repository
# Description
OpenTelemetry Collector administration on Rocky Linux 8/9 including otelcol-contrib installation, configuration of receivers, processors, exporters, and pipelines for metrics, traces, and logs. Covers OTLP, Prometheus, filelog, hostmetrics, journald receivers and OpenSearch, OTLP, Prometheus exporters. Use when setting up observability pipelines or configuring the OTel collector.
# SKILL.md
name: rocky-opentelemetry
description: OpenTelemetry Collector administration on Rocky Linux 8/9 including otelcol-contrib installation, configuration of receivers, processors, exporters, and pipelines for metrics, traces, and logs. Covers OTLP, Prometheus, filelog, hostmetrics, journald receivers and OpenSearch, OTLP, Prometheus exporters. Use when setting up observability pipelines or configuring the OTel collector.
OpenTelemetry Collector Administration
Installation, configuration, receivers, processors, exporters, and pipeline patterns for the OpenTelemetry Collector on Rocky Linux 8/9.
Prerequisite: See rocky-foundation for OS detection and safety tier definitions.
Installation
RPM Install (otelcol-contrib)
# Download and install otelcol-contrib RPM # [CONFIRM]
# Check latest version at: https://github.com/open-telemetry/opentelemetry-collector-releases/releases
OTEL_VERSION="0.96.0" # Update to latest
# Download RPM # [CONFIRM]
curl -Lo /tmp/otelcol-contrib.rpm \
"https://github.com/open-telemetry/opentelemetry-collector-releases/releases/download/v${OTEL_VERSION}/otelcol-contrib_${OTEL_VERSION}_linux_amd64.rpm"
# Install # [CONFIRM]
dnf install -y /tmp/otelcol-contrib.rpm
# Verify installation # [READ-ONLY]
otelcol-contrib --version
Systemd Service Setup
The RPM typically creates a systemd service. If not:
# /etc/systemd/system/otelcol-contrib.service # [CONFIRM]
[Unit]
Description=OpenTelemetry Collector Contrib
After=network-online.target
Wants=network-online.target
[Service]
Type=simple
User=otelcol
Group=otelcol
ExecStart=/usr/bin/otelcol-contrib --config=/etc/otelcol-contrib/config.yaml
ExecReload=/bin/kill -HUP $MAINPID
Restart=on-failure
RestartSec=5
MemoryLimit=512M
# Hardening
NoNewPrivileges=yes
ProtectSystem=strict
ProtectHome=yes
ReadWritePaths=/var/log/otelcol /var/lib/otelcol
PrivateTmp=yes
[Install]
WantedBy=multi-user.target
# Create user # [CONFIRM]
useradd -r -s /sbin/nologin -d /var/lib/otelcol otelcol
# Create directories # [CONFIRM]
mkdir -p /etc/otelcol-contrib /var/log/otelcol /var/lib/otelcol
chown otelcol:otelcol /var/log/otelcol /var/lib/otelcol
# Enable and start # [CONFIRM]
systemctl daemon-reload
systemctl enable --now otelcol-contrib
# Check status # [READ-ONLY]
systemctl status otelcol-contrib
journalctl -u otelcol-contrib --since "5 min ago"
Validate Configuration
# Validate config before applying # [READ-ONLY]
otelcol-contrib validate --config=/etc/otelcol-contrib/config.yaml
# Reload config # [CONFIRM]
systemctl reload otelcol-contrib
# Or: kill -HUP $(pgrep otelcol)
Configuration Structure
The OTel Collector config has four main sections:
# /etc/otelcol-contrib/config.yaml
receivers: # How data gets in
processors: # How data is transformed
exporters: # Where data goes out
service: # Ties receivers -> processors -> exporters into pipelines
Minimal Working Configuration
# /etc/otelcol-contrib/config.yaml # [CONFIRM]
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
processors:
memory_limiter:
check_interval: 5s
limit_mib: 400
spike_limit_mib: 100
batch:
timeout: 5s
send_batch_size: 1000
exporters:
debug:
verbosity: basic
service:
pipelines:
traces:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [debug]
metrics:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [debug]
logs:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [debug]
Receivers
OTLP Receiver
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
max_recv_msg_size_mib: 4
http:
endpoint: 0.0.0.0:4318
cors:
allowed_origins:
- "https://example.com"
Prometheus Receiver (Scrape Targets)
receivers:
prometheus:
config:
scrape_configs:
- job_name: 'node-exporter'
scrape_interval: 15s
static_configs:
- targets: ['localhost:9100']
labels:
env: production
- job_name: 'opensearch'
scrape_interval: 30s
metrics_path: /_prometheus/metrics
static_configs:
- targets: ['localhost:9200']
basic_auth:
username: admin
password: admin
tls_config:
insecure_skip_verify: true
Filelog Receiver (Log Files)
receivers:
filelog:
include:
- /var/log/nginx/access.log
- /var/log/nginx/error.log
- /var/log/myapp/*.log
exclude:
- /var/log/nginx/*.gz
start_at: end # 'beginning' or 'end'
include_file_name: true
include_file_path: true
operators:
- type: regex_parser
regex: '^(?P<remote_addr>\S+) - (?P<remote_user>\S+) \[(?P<time_local>[^\]]+)\] "(?P<request>[^"]+)" (?P<status>\d+) (?P<body_bytes_sent>\d+)'
timestamp:
parse_from: attributes.time_local
layout: '02/Jan/2006:15:04:05 -0700'
severity:
parse_from: attributes.status
mapping:
error: ['500', '502', '503', '504']
warn: ['400', '401', '403', '404']
info: ['200', '201', '301', '302']
filelog/json:
include:
- /var/log/myapp/app.json.log
start_at: end
operators:
- type: json_parser
timestamp:
parse_from: attributes.timestamp
layout: '%Y-%m-%dT%H:%M:%S.%LZ'
Host Metrics Receiver
receivers:
hostmetrics:
collection_interval: 30s
scrapers:
cpu:
metrics:
system.cpu.utilization:
enabled: true
memory:
metrics:
system.memory.utilization:
enabled: true
disk: {}
filesystem: {}
load: {}
network: {}
processes: {}
process:
include:
match_type: regexp
names:
- opensearch
- nginx
- php-fpm
Journald Receiver
receivers:
journald:
directory: /var/log/journal
units:
- sshd
- nginx
- opensearch
- otelcol-contrib
priority: info # Minimum priority
Note: The otelcol user needs access to the journal:
# Add otelcol to systemd-journal group # [CONFIRM]
usermod -aG systemd-journal otelcol
systemctl restart otelcol-contrib
Processors
Mandatory Processors
Always include memory_limiter and batch in every pipeline.
processors:
# MUST be first in pipeline -- prevents OOM
memory_limiter:
check_interval: 5s
limit_mib: 400 # Hard limit
spike_limit_mib: 100 # Soft limit (starts dropping data)
# SHOULD be last before exporters -- improves throughput
batch:
timeout: 5s
send_batch_size: 1000
send_batch_max_size: 1500
Filter Processor
processors:
filter/drop-health:
error_mode: ignore
traces:
span:
- 'attributes["http.route"] == "/health"'
- 'attributes["http.route"] == "/ready"'
metrics:
metric:
- 'name == "system.cpu.time" and resource.attributes["host.name"] == "test-host"'
logs:
log_record:
- 'severity_number < 9' # Drop below WARN
Transform Processor
processors:
transform:
error_mode: ignore
log_statements:
- context: log
statements:
- set(resource.attributes["service.name"], "myapp") where resource.attributes["service.name"] == nil
- set(attributes["env"], "production")
trace_statements:
- context: span
statements:
- set(attributes["deployment.environment"], "production")
metric_statements:
- context: datapoint
statements:
- set(attributes["env"], "production")
Resource Processor
processors:
resource:
attributes:
- key: host.name
from_attribute: host.name
action: upsert
- key: environment
value: production
action: insert
- key: service.namespace
value: myteam
action: insert
Attributes Processor
processors:
attributes/remove-sensitive:
actions:
- key: http.request.header.authorization
action: delete
- key: db.statement
action: hash # Hash sensitive values
- key: user.email
action: delete
Exporters
OpenSearch Exporter
exporters:
opensearch:
http:
endpoint: https://localhost:9200
tls:
insecure_skip_verify: true # For self-signed certs
auth:
authenticator: basicauth/opensearch
logs_index: otel-logs
traces_index: otel-traces
# Retry and queue
retry_on_failure:
enabled: true
initial_interval: 5s
max_interval: 30s
max_elapsed_time: 300s
sending_queue:
enabled: true
num_consumers: 10
queue_size: 5000
extensions:
basicauth/opensearch:
client_auth:
username: admin
password: admin
OTLP Exporter (to Another Collector)
exporters:
otlp:
endpoint: gateway-collector:4317
tls:
insecure: false
ca_file: /etc/otelcol-contrib/certs/ca.pem
retry_on_failure:
enabled: true
sending_queue:
enabled: true
queue_size: 5000
otlphttp:
endpoint: https://gateway-collector:4318
tls:
ca_file: /etc/otelcol-contrib/certs/ca.pem
Prometheus Exporter (Expose Metrics)
exporters:
prometheus:
endpoint: 0.0.0.0:8889
namespace: otel
resource_to_telemetry_conversion:
enabled: true
Debug Exporter (Development)
exporters:
debug:
verbosity: detailed # basic, normal, detailed
sampling_initial: 5
sampling_thereafter: 200
Pipeline Patterns
Full Production Pipeline
# /etc/otelcol-contrib/config.yaml # [CONFIRM]
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
hostmetrics:
collection_interval: 30s
scrapers:
cpu: {}
memory: {}
disk: {}
filesystem: {}
load: {}
network: {}
filelog/nginx:
include:
- /var/log/nginx/access.log
- /var/log/nginx/error.log
start_at: end
include_file_name: true
journald:
units:
- sshd
- nginx
- opensearch
processors:
memory_limiter:
check_interval: 5s
limit_mib: 400
spike_limit_mib: 100
batch:
timeout: 5s
send_batch_size: 1000
resource:
attributes:
- key: host.name
from_attribute: host.name
action: upsert
- key: environment
value: production
action: insert
filter/drop-health:
error_mode: ignore
traces:
span:
- 'attributes["http.route"] == "/health"'
exporters:
opensearch:
http:
endpoint: https://opensearch:9200
tls:
insecure_skip_verify: true
auth:
authenticator: basicauth/opensearch
logs_index: otel-logs
traces_index: otel-traces
prometheus:
endpoint: 0.0.0.0:8889
extensions:
basicauth/opensearch:
client_auth:
username: otel
password: otel_password
health_check:
endpoint: 0.0.0.0:13133
zpages:
endpoint: 0.0.0.0:55679
service:
extensions: [basicauth/opensearch, health_check, zpages]
pipelines:
traces:
receivers: [otlp]
processors: [memory_limiter, filter/drop-health, resource, batch]
exporters: [opensearch]
metrics:
receivers: [otlp, hostmetrics]
processors: [memory_limiter, resource, batch]
exporters: [prometheus]
logs:
receivers: [otlp, filelog/nginx, journald]
processors: [memory_limiter, resource, batch]
exporters: [opensearch]
Gateway Pattern (Fan-In from Edge Collectors)
# Gateway collector receives from edge collectors via OTLP # [CONFIRM]
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
processors:
memory_limiter:
check_interval: 5s
limit_mib: 2048
spike_limit_mib: 512
batch:
timeout: 10s
send_batch_size: 5000
exporters:
opensearch:
http:
endpoint: https://opensearch-cluster:9200
tls:
insecure_skip_verify: true
auth:
authenticator: basicauth/opensearch
extensions:
basicauth/opensearch:
client_auth:
username: otel
password: otel_password
service:
extensions: [basicauth/opensearch]
pipelines:
traces:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [opensearch]
metrics:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [opensearch]
logs:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [opensearch]
Firewall and SELinux
# Open OTLP ports # [CONFIRM]
firewall-cmd --add-port=4317/tcp --permanent # gRPC
firewall-cmd --add-port=4318/tcp --permanent # HTTP
firewall-cmd --add-port=8889/tcp --permanent # Prometheus exporter
firewall-cmd --add-port=13133/tcp --permanent # Health check
firewall-cmd --reload
# SELinux: allow otelcol to read log files # [CONFIRM]
setsebool -P daemons_read_all_files_and_dirs on
# Or more targeted:
semanage fcontext -a -t var_log_t "/var/log/otelcol(/.*)?"
restorecon -Rv /var/log/otelcol/
Monitoring the Collector
# Health check # [READ-ONLY]
curl -s http://localhost:13133/
# zPages (internal debug) # [READ-ONLY]
curl -s http://localhost:55679/debug/tracez
curl -s http://localhost:55679/debug/pipelinez
# Collector metrics # [READ-ONLY]
curl -s http://localhost:8888/metrics # Internal telemetry
# Logs # [READ-ONLY]
journalctl -u otelcol-contrib -f
journalctl -u otelcol-contrib --since "5 min ago"
Checklist: OTel Collector Deployment
- [ ] Install otelcol-contrib RPM
- [ ] Create otelcol user and directories
- [ ] Write and validate config (
otelcol-contrib validate) - [ ]
memory_limiteris first processor in every pipeline - [ ]
batchis last processor before exporters - [ ] SELinux allows log file access
- [ ] Firewall ports open for receivers
- [ ] Systemd service enabled
- [ ] Health check endpoint accessible
- [ ] Test data flowing through pipeline
When to Use This Skill
- Installing the OpenTelemetry Collector
- Configuring data pipelines (logs, metrics, traces)
- Setting up receivers for various data sources
- Configuring exporters to OpenSearch or other backends
- Tuning collector performance and memory limits
- Troubleshooting data flow issues
Related Skills
- rocky-foundation -- OS detection, safety tiers
- rocky-core-system -- systemd service management
- rocky-opensearch -- OpenSearch as export destination
- rocky-selinux -- SELinux contexts for log access
- rocky-networking -- Firewall rules for collector ports
- rocky-webstack -- Nginx/PHP log collection
# Supported AI Coding Agents
This skill is compatible with the SKILL.md standard and works with all major AI coding agents:
Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.