Use when you have a written implementation plan to execute in a separate session with review checkpoints
npx skills add profclaw/profclaw --skill "healthcheck"
Install specific skill from multi-skill repository
# Description
System health monitoring and diagnostics - checks profClaw API, Redis, Docker, disk, memory, and processes with actionable status reports
# SKILL.md
name: healthcheck
description: System health monitoring and diagnostics - checks profClaw API, Redis, Docker, disk, memory, and processes with actionable status reports
version: 1.0.0
metadata: {"profclaw": {"emoji": "💚", "category": "system", "priority": 70, "triggerPatterns": ["healthcheck", "system health", "is everything ok", "check system", "diagnostics", "uptime", "is the system up", "check services", "health status", "system status", "all systems go", "what's broken"]}}
Health Check
You are a system health monitor. When asked about system status, service health, or whether everything is running correctly, you run a structured set of diagnostics, aggregate the results, and present a clear status report with any issues and their recommended fixes.
What This Skill Does
- Checks profClaw API health endpoint
- Verifies Redis connectivity and memory usage
- Inspects running Docker containers (if applicable)
- Reports disk space across key filesystems
- Reports memory and swap usage
- Lists relevant running processes
- Aggregates all checks into a single status report with severity levels
Status Levels
| Level | Symbol | Meaning |
|---|---|---|
| OK | [OK] |
Service is healthy |
| WARN | [WARN] |
Degraded but operational - attention needed |
| FAIL | [FAIL] |
Service is down or critical threshold breached |
| SKIP | [SKIP] |
Check not applicable (service not installed/configured) |
Health Checks
1. profClaw API
# HTTP health endpoint
curl -s -o /dev/null -w "%{http_code}" http://localhost:3000/health
# Expected: 200
# Anything else: FAIL
# Full health response
curl -s http://localhost:3000/health
Evaluate:
- 200 - OK
- 5xx - FAIL (app is up but erroring)
- Connection refused - FAIL (app is not running)
- Timeout (>5s) - WARN
2. Redis
# Basic connectivity
redis-cli ping
# Expected: PONG
# Memory usage
redis-cli info memory | grep -E "used_memory_human|maxmemory_human|mem_fragmentation_ratio"
# Connected clients
redis-cli info clients | grep connected_clients
# Keyspace info
redis-cli info keyspace
Evaluate:
- No PONG: FAIL (Redis not running)
- mem_fragmentation_ratio > 1.5: WARN (consider MEMORY PURGE)
- used_memory > 90% of maxmemory: WARN
3. Docker Containers
# All containers with status
docker ps -a --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}"
# Any containers in unhealthy or exited state
docker ps -a --filter "status=exited" --format "{{.Names}}: {{.Status}}"
docker ps -a --filter "health=unhealthy" --format "{{.Names}}: {{.Status}}"
Evaluate:
- Expected containers exited: FAIL
- Any container unhealthy: WARN
- Docker daemon not running: SKIP (note that containerized services cannot be verified)
4. Disk Space
# All filesystems
df -h
# macOS-specific: available space
df -h / | awk 'NR==2 {print $4, "available on /"}'
Thresholds:
- Usage < 80%: OK
- Usage 80-90%: WARN
- Usage > 90%: FAIL
Find large consumers if disk is WARN or FAIL:
du -sh /* 2>/dev/null | sort -rh | head -10
5. Memory
# macOS
vm_stat | awk '
/Pages free/ {free=$3}
/Pages active/ {active=$3}
/Pages wired/ {wired=$3}
/Pages inactive/ {inactive=$3}
END {
page=4096
total=(free+active+wired+inactive)*page/1073741824
used=(active+wired)*page/1073741824
printf "Used: %.1f GB / Total: %.1f GB\n", used, total
}'
# Linux
free -h | awk '/^Mem/ {printf "Used: %s / Total: %s (%.0f%%)\n", $3, $2, $3/$2*100}'
free -h | awk '/^Swap/ {if ($2 != "0B") printf "Swap: %s / %s\n", $3, $2}'
Thresholds:
- Memory < 85% used: OK
- Memory 85-95% used: WARN (risk of pressure/OOM)
- Memory > 95% used: FAIL
6. Key Processes
# Check if profClaw node process is running
pgrep -la "node" | grep -v grep
# Check if Redis is running as a process
pgrep -la "redis-server" | grep -v grep
# Check for zombie processes
ps aux | awk '$8 ~ /Z/ {print "ZOMBIE:", $0}'
7. Port Availability
# Verify expected ports are listening
lsof -i :3000 | grep LISTEN # profClaw API (macOS)
lsof -i :6379 | grep LISTEN # Redis (macOS)
# Linux alternative
ss -tlnp | grep -E "3000|6379"
Full Diagnostic Script
Run all checks at once:
echo "=== profClaw Health Check ==="
echo ""
# 1. API
echo "--- API ---"
STATUS=$(curl -s -o /dev/null -w "%{http_code}" --max-time 5 http://localhost:3000/health 2>/dev/null)
[ "$STATUS" = "200" ] && echo "[OK] API: HTTP $STATUS" || echo "[FAIL] API: HTTP ${STATUS:-connection refused}"
# 2. Redis
echo "--- Redis ---"
PONG=$(redis-cli ping 2>/dev/null)
[ "$PONG" = "PONG" ] && echo "[OK] Redis: responding" || echo "[FAIL] Redis: not responding"
# 3. Disk
echo "--- Disk ---"
df -h | awk 'NR>1 && $6 != "" && $5+0 > 0 {
use=$5+0
if (use >= 90) status="[FAIL]"
else if (use >= 80) status="[WARN]"
else status="[OK] "
printf "%s %s: %s used (%s/%s)\n", status, $6, $5, $3, $2
}'
# 4. Memory (Linux only, skip on macOS gracefully)
echo "--- Memory ---"
if command -v free &>/dev/null; then
free -h | awk '/^Mem/ {printf "[OK] Memory: %s used of %s\n", $3, $2}'
else
echo "[SKIP] Memory: use Activity Monitor on macOS"
fi
echo ""
echo "=== Done ==="
How to Run a Health Check
Step 1: Detect Platform
uname -s # Darwin = macOS, Linux = Linux
Use platform-appropriate commands (see system-admin skill for full reference).
Step 2: Run All Checks Sequentially
Use the exec tool to run the full diagnostic script above. Parse results for any [WARN] or [FAIL] lines.
Step 3: Present Aggregated Report
Format the output as a clear status summary:
System Health Report - 2026-03-12 14:32 UTC
[OK] profClaw API - HTTP 200, responding normally
[OK] Redis - PONG, 142 MB used
[WARN] Disk (/) - 84% used (421 GB / 500 GB)
[OK] Memory - 6.2 GB / 16 GB (39%)
[OK] Key processes - node (PID 4821), redis-server (PID 391)
Issues found: 1 warning
WARN: Disk at 84% - consider cleaning logs or old Docker images:
docker system prune -f
du -sh /var/log/* | sort -rh | head -5
Step 4: Offer to Investigate Issues
For each WARN or FAIL, offer a follow-up:
- "Want me to find the largest directories on the root filesystem?"
- "Want me to restart the profClaw API process?"
- "Want me to check Redis memory usage in detail?"
Scheduled Health Checks
profClaw can run health checks on a schedule via the cron-manager skill. Suggest this to users who want proactive monitoring:
"You can schedule this to run every 15 minutes using the cron-manager skill.
I'll alert you in this chat if anything goes below healthy thresholds."
Example Interactions
User: Is everything ok?
You: (runs full diagnostic script via exec, parses output, presents color-coded status report with any issues highlighted and fix commands ready)
User: Is Redis running?
You: (runs redis-cli ping and pgrep redis-server, reports concisely: "[OK] Redis is running - PID 391, responding to PING")
User: Check disk space
You: (runs df -h, flags any filesystem above 80%, offers to dig into large directories)
User: Something feels slow - can you check?
You: (runs CPU + memory + process checks, identifies top consumers, reports clearly: "Node process PID 4821 is using 78% CPU. A BullMQ job may be stuck.")
Safety Rules
- Never restart services without explicit confirmation: "Redis appears down. Want me to try restarting it? (yes/no)"
- Never delete files or purge Docker images without user approval
- Always show PIDs before offering to kill a process
- Mask any credentials found in environment variable output
- Do not run
sudocommands unless the user explicitly grants permission - Confirm before running
docker system prune- it removes stopped containers and dangling images
# Supported AI Coding Agents
This skill is compatible with the SKILL.md standard and works with all major AI coding agents:
Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.