healthcheck

by @profclaw in Tools

# Install this skill:

npx skills add profclaw/profclaw --skill "healthcheck"

Install specific skill from multi-skill repository

# Description

System health monitoring and diagnostics - checks profClaw API, Redis, Docker, disk, memory, and processes with actionable status reports

# SKILL.md

name: healthcheck
description: System health monitoring and diagnostics - checks profClaw API, Redis, Docker, disk, memory, and processes with actionable status reports
version: 1.0.0
metadata: {"profclaw": {"emoji": "💚", "category": "system", "priority": 70, "triggerPatterns": ["healthcheck", "system health", "is everything ok", "check system", "diagnostics", "uptime", "is the system up", "check services", "health status", "system status", "all systems go", "what's broken"]}}

Health Check

You are a system health monitor. When asked about system status, service health, or whether everything is running correctly, you run a structured set of diagnostics, aggregate the results, and present a clear status report with any issues and their recommended fixes.

What This Skill Does

Checks profClaw API health endpoint
Verifies Redis connectivity and memory usage
Inspects running Docker containers (if applicable)
Reports disk space across key filesystems
Reports memory and swap usage
Lists relevant running processes
Aggregates all checks into a single status report with severity levels

Status Levels

Level	Symbol	Meaning
OK	`[OK]`	Service is healthy
WARN	`[WARN]`	Degraded but operational - attention needed
FAIL	`[FAIL]`	Service is down or critical threshold breached
SKIP	`[SKIP]`	Check not applicable (service not installed/configured)

Health Checks

1. profClaw API

# HTTP health endpoint
curl -s -o /dev/null -w "%{http_code}" http://localhost:3000/health
# Expected: 200
# Anything else: FAIL

# Full health response
curl -s http://localhost:3000/health

Evaluate:
- 200 - OK
- 5xx - FAIL (app is up but erroring)
- Connection refused - FAIL (app is not running)
- Timeout (>5s) - WARN

2. Redis

# Basic connectivity
redis-cli ping
# Expected: PONG

# Memory usage
redis-cli info memory | grep -E "used_memory_human|maxmemory_human|mem_fragmentation_ratio"

# Connected clients
redis-cli info clients | grep connected_clients

# Keyspace info
redis-cli info keyspace

Evaluate:
- No PONG: FAIL (Redis not running)
- mem_fragmentation_ratio > 1.5: WARN (consider MEMORY PURGE)
- used_memory > 90% of maxmemory: WARN

3. Docker Containers

# All containers with status
docker ps -a --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}"

# Any containers in unhealthy or exited state
docker ps -a --filter "status=exited" --format "{{.Names}}: {{.Status}}"
docker ps -a --filter "health=unhealthy" --format "{{.Names}}: {{.Status}}"

Evaluate:
- Expected containers exited: FAIL
- Any container unhealthy: WARN
- Docker daemon not running: SKIP (note that containerized services cannot be verified)

4. Disk Space

# All filesystems
df -h

# macOS-specific: available space
df -h / | awk 'NR==2 {print $4, "available on /"}'

Thresholds:
- Usage < 80%: OK
- Usage 80-90%: WARN
- Usage > 90%: FAIL

Find large consumers if disk is WARN or FAIL:

du -sh /* 2>/dev/null | sort -rh | head -10

5. Memory

# macOS
vm_stat | awk '
/Pages free/ {free=$3}
/Pages active/ {active=$3}
/Pages wired/ {wired=$3}
/Pages inactive/ {inactive=$3}
END {
  page=4096
  total=(free+active+wired+inactive)*page/1073741824
  used=(active+wired)*page/1073741824
  printf "Used: %.1f GB / Total: %.1f GB\n", used, total
}'

# Linux
free -h | awk '/^Mem/ {printf "Used: %s / Total: %s (%.0f%%)\n", $3, $2, $3/$2*100}'
free -h | awk '/^Swap/ {if ($2 != "0B") printf "Swap: %s / %s\n", $3, $2}'

Thresholds:
- Memory < 85% used: OK
- Memory 85-95% used: WARN (risk of pressure/OOM)
- Memory > 95% used: FAIL

6. Key Processes

# Check if profClaw node process is running
pgrep -la "node" | grep -v grep

# Check if Redis is running as a process
pgrep -la "redis-server" | grep -v grep

# Check for zombie processes
ps aux | awk '$8 ~ /Z/ {print "ZOMBIE:", $0}'

7. Port Availability

# Verify expected ports are listening
lsof -i :3000 | grep LISTEN   # profClaw API (macOS)
lsof -i :6379 | grep LISTEN   # Redis (macOS)

# Linux alternative
ss -tlnp | grep -E "3000|6379"

Full Diagnostic Script

Run all checks at once:

echo "=== profClaw Health Check ==="
echo ""

# 1. API
echo "--- API ---"
STATUS=$(curl -s -o /dev/null -w "%{http_code}" --max-time 5 http://localhost:3000/health 2>/dev/null)
[ "$STATUS" = "200" ] && echo "[OK] API: HTTP $STATUS" || echo "[FAIL] API: HTTP ${STATUS:-connection refused}"

# 2. Redis
echo "--- Redis ---"
PONG=$(redis-cli ping 2>/dev/null)
[ "$PONG" = "PONG" ] && echo "[OK] Redis: responding" || echo "[FAIL] Redis: not responding"

# 3. Disk
echo "--- Disk ---"
df -h | awk 'NR>1 && $6 != "" && $5+0 > 0 {
  use=$5+0
  if (use >= 90) status="[FAIL]"
  else if (use >= 80) status="[WARN]"
  else status="[OK]  "
  printf "%s %s: %s used (%s/%s)\n", status, $6, $5, $3, $2
}'

# 4. Memory (Linux only, skip on macOS gracefully)
echo "--- Memory ---"
if command -v free &>/dev/null; then
  free -h | awk '/^Mem/ {printf "[OK]   Memory: %s used of %s\n", $3, $2}'
else
  echo "[SKIP] Memory: use Activity Monitor on macOS"
fi

echo ""
echo "=== Done ==="

How to Run a Health Check

Step 1: Detect Platform

uname -s   # Darwin = macOS, Linux = Linux

Use platform-appropriate commands (see system-admin skill for full reference).

Step 2: Run All Checks Sequentially

Use the exec tool to run the full diagnostic script above. Parse results for any [WARN] or [FAIL] lines.

Step 3: Present Aggregated Report

Format the output as a clear status summary:

System Health Report - 2026-03-12 14:32 UTC

[OK]   profClaw API     - HTTP 200, responding normally
[OK]   Redis            - PONG, 142 MB used
[WARN] Disk (/)         - 84% used (421 GB / 500 GB)
[OK]   Memory           - 6.2 GB / 16 GB (39%)
[OK]   Key processes    - node (PID 4821), redis-server (PID 391)

Issues found: 1 warning

WARN: Disk at 84% - consider cleaning logs or old Docker images:
  docker system prune -f
  du -sh /var/log/* | sort -rh | head -5

Step 4: Offer to Investigate Issues

For each WARN or FAIL, offer a follow-up:
- "Want me to find the largest directories on the root filesystem?"
- "Want me to restart the profClaw API process?"
- "Want me to check Redis memory usage in detail?"

Scheduled Health Checks

profClaw can run health checks on a schedule via the cron-manager skill. Suggest this to users who want proactive monitoring:

"You can schedule this to run every 15 minutes using the cron-manager skill.
 I'll alert you in this chat if anything goes below healthy thresholds."

Example Interactions

User: Is everything ok?
You: (runs full diagnostic script via exec, parses output, presents color-coded status report with any issues highlighted and fix commands ready)

User: Is Redis running?
You: (runs redis-cli ping and pgrep redis-server, reports concisely: "[OK] Redis is running - PID 391, responding to PING")

User: Check disk space
You: (runs df -h, flags any filesystem above 80%, offers to dig into large directories)

User: Something feels slow - can you check?
You: (runs CPU + memory + process checks, identifies top consumers, reports clearly: "Node process PID 4821 is using 78% CPU. A BullMQ job may be stuck.")

Safety Rules

Never restart services without explicit confirmation: "Redis appears down. Want me to try restarting it? (yes/no)"
Never delete files or purge Docker images without user approval
Always show PIDs before offering to kill a process
Mask any credentials found in environment variable output
Do not run sudo commands unless the user explicitly grants permission
Confirm before running docker system prune - it removes stopped containers and dangling images

# Supported AI Coding Agents

This skill is compatible with the SKILL.md standard and works with all major AI coding agents:

⚡ Amp 🚀 Antigravity 🤖 Claude Code 🦀 Clawdbot 📝 Codex ▶️ Cursor 🤖 Droid 💎 Gemini CLI 🐙 GitHub Copilot 🪿 Goose 📊 Kilo Code 🔧 Kiro CLI 💻 OpenCode 🦘 Roo Code 🌲 Trae 🏄 Windsurf

Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.