Use when adding new error messages to React, or seeing "unknown error code" warnings.
npx skills add nctiggy/claude-skills --skill "spectrocloud-troubleshooting"
Install specific skill from multi-skill repository
# Description
Debug Spectro Cloud Palette clusters using event streams, log bundles, and edge host logs. Tracks known errors - both actionable and noise.
# SKILL.md
name: spectrocloud-troubleshooting
description: Debug Spectro Cloud Palette clusters using event streams, log bundles, and edge host logs. Tracks known errors - both actionable and noise.
Spectro Cloud Troubleshooting
Debug cluster provisioning, upgrades, and operational issues using Palette's event streams and logs.
Key Concepts
- Event Stream: Live cluster events showing orchestration progress and errors
- Cluster Conditions: Milestones like "Creating Infrastructure", "Adding Control Plane Node"
- Log Bundle: Downloadable archive with Spectro logs, system logs, and manifests
- Reconciliation: Palette retries failed operations - intermittent errors may resolve automatically
Cluster Event Stream (API)
Events stream live and show orchestration progress. Watch these during provisioning.
Get Cluster Events
# Get recent events for a cluster
curl -s "https://api.spectrocloud.com/v1/spectroclusters/$CLUSTER_UID/events?limit=50" \
-H "ApiKey: $PALETTE_API_KEY" \
-H "ProjectUid: $PROJECT_UID" | \
jq '[.items[] | {time: .metadata.creationTimestamp, type: .involvedObject.kind,
reason: .reason, message: .message}]'
Get Cluster Status & Conditions
# Overview with conditions and upgrade history
curl -s "https://api.spectrocloud.com/v1/dashboard/spectroclusters/$CLUSTER_UID/overview" \
-H "ApiKey: $PALETTE_API_KEY" \
-H "ProjectUid: $PROJECT_UID" | \
jq '{state: .status.state, conditions: .status.conditions, upgrades: .status.upgrades}'
Watch Events (Poll)
# Poll events every 10 seconds during provisioning
while true; do
echo "=== $(date) ==="
curl -s "https://api.spectrocloud.com/v1/spectroclusters/$CLUSTER_UID/events?limit=10" \
-H "ApiKey: $PALETTE_API_KEY" \
-H "ProjectUid: $PROJECT_UID" | \
jq -r '.items[] | "\(.metadata.creationTimestamp) [\(.reason)] \(.message)"' | head -10
sleep 10
done
Log Bundle Download
Download comprehensive logs for deep debugging or support requests.
Download via API
# Download log bundle (returns zip file)
curl -s "https://api.spectrocloud.com/v1/spectroclusters/$CLUSTER_UID/features/logFetcher/logs" \
-H "ApiKey: $PALETTE_API_KEY" \
-H "ProjectUid: $PROJECT_UID" \
-o spectro_logs.zip
# Extract and examine
unzip spectro_logs.zip -d cluster-logs/
ls cluster-logs/
Log Bundle Contents
| File/Folder | Contents |
|---|---|
Manifest.yaml |
CRDs, Deployments, Pods, ConfigMaps, Events, Nodes |
spectro-*.log |
Palette agent logs (last 10k lines) |
system.log |
System logs |
cloud-init.log |
Cloud-init output |
Edge Host Logs (SSH)
For edge deployments, SSH to the host for live debugging.
# SSH to edge host (default creds: kairos/kairos)
ssh kairos@<edge-host-ip>
# Watch Palette agent logs (most useful)
sudo journalctl -u spectro-stylus-agent.service -f
# Check all Spectro services
sudo systemctl list-units 'spectro*'
# Key log files
cat /var/log/stylus-upgrade.log
cat /var/log/kube-init.log
cat /var/log/kube-join.log
# Check kubelet if K8s is initializing
sudo journalctl -u kubelet -f
Cluster State Reference
| State | Meaning |
|---|---|
Pending |
Waiting for resources |
Provisioning |
Creating infrastructure/nodes |
Running |
Healthy and operational |
Updating |
Profile or config change in progress |
Deleting |
Teardown in progress |
Failed |
Unrecoverable error (check events) |
Known Errors: Noise vs. Actionable
Palette's reconciliation pattern means some errors are transient and resolve automatically.
Normal/Transient (Usually Safe to Ignore)
These often appear during provisioning but resolve:
| Error Pattern | Why It's OK |
|---|---|
context deadline exceeded |
Temporary timeout, will retry |
connection refused to API server |
Node not ready yet, will retry |
unable to retrieve node |
Node registering, will resolve |
waiting for control plane |
Normal during bootstrap |
etcd cluster is not healthy |
Etcd initializing, give it time |
Actionable Errors (Investigate)
These indicate real problems:
| Error Pattern | Likely Cause | Action |
|---|---|---|
failed to pull image |
Registry auth or network | Check registry creds, network |
node not found (persistent) |
Edge host not registered | Check edge host in Palette UI |
insufficient resources |
Node too small | Increase CPU/RAM |
pack validation failed |
Profile misconfigured | Check pack values |
version mismatch |
K8s/image version conflict | Align provider image with K8s pack |
duplicate edge host ID |
Re-imaging without cleanup | Delete old edge host from Palette |
certificate has expired |
Stale certs | Re-register edge host |
2-Node Specific Errors
| Error Pattern | Likely Cause | Action |
|---|---|---|
leader election failed |
Both nodes competing | Check network between nodes |
kine connection refused |
Postgres/Kine not ready | Wait, check node logs |
split brain |
Network partition | Restore network, may need recovery |
Debugging Workflow
1. Check Cluster State
curl -s "https://api.spectrocloud.com/v1/spectroclusters/$CLUSTER_UID" \
-H "ApiKey: $PALETTE_API_KEY" \
-H "ProjectUid: $PROJECT_UID" | \
jq '{name: .metadata.name, state: .status.state, health: .status.clusterHealth}'
2. Check Active Condition
curl -s "https://api.spectrocloud.com/v1/dashboard/spectroclusters/$CLUSTER_UID/overview" \
-H "ApiKey: $PALETTE_API_KEY" \
-H "ProjectUid: $PROJECT_UID" | \
jq '.status.conditions[] | select(.status != "True") | {type, message, reason}'
3. Recent Events
curl -s "https://api.spectrocloud.com/v1/spectroclusters/$CLUSTER_UID/events?limit=20" \
-H "ApiKey: $PALETTE_API_KEY" \
-H "ProjectUid: $PROJECT_UID" | \
jq -r '.items[] | "\(.metadata.creationTimestamp) [\(.reason)] \(.message)"'
4. SSH to Edge Host (if applicable)
ssh kairos@<edge-host-ip>
sudo journalctl -u spectro-stylus-agent.service -n 100 --no-pager
5. Download Full Logs (for support)
curl -s "https://api.spectrocloud.com/v1/spectroclusters/$CLUSTER_UID/features/logFetcher/logs" \
-H "ApiKey: $PALETTE_API_KEY" \
-H "ProjectUid: $PROJECT_UID" \
-o spectro_logs_$(date +%Y%m%d_%H%M).zip
Edge Host Registration Issues
# List edge hosts and their state
curl -s "https://api.spectrocloud.com/v1/edgehosts" \
-H "ApiKey: $PALETTE_API_KEY" \
-H "ProjectUid: $PROJECT_UID" | \
jq '[.items[] | {name: .metadata.name, uid: .metadata.uid, state: .status.state,
health: .status.health.state, cluster: .status.clusterUid}]'
# Check specific edge host
curl -s "https://api.spectrocloud.com/v1/edgehosts/$EDGEHOST_UID" \
-H "ApiKey: $PALETTE_API_KEY" \
-H "ProjectUid: $PROJECT_UID" | \
jq '{name: .metadata.name, state: .status.state, health: .status.health,
lastHeartbeat: .status.lastHeartbeat}'
Pack Deployment Issues
# Check pack status on cluster
curl -s "https://api.spectrocloud.com/v1/spectroclusters/$CLUSTER_UID/packs/status" \
-H "ApiKey: $PALETTE_API_KEY" \
-H "ProjectUid: $PROJECT_UID" | \
jq '[.items[] | {name: .name, status: .status, message: .message}]'
Adding New Learnings
This skill is a living document. When you encounter new errors:
- Determine if it's noise (transient, resolved by retry) or actionable
- Add to the appropriate table above
- Include the error pattern, cause, and recommended action
- Commit with descriptive message
Quick Reference
| Operation | Endpoint |
|---|---|
| Cluster events | GET /v1/spectroclusters/{uid}/events |
| Cluster overview | GET /v1/dashboard/spectroclusters/{uid}/overview |
| Download logs | GET /v1/spectroclusters/{uid}/features/logFetcher/logs |
| Edge hosts | GET /v1/edgehosts |
| Pack status | GET /v1/spectroclusters/{uid}/packs/status |
Links
# Supported AI Coding Agents
This skill is compatible with the SKILL.md standard and works with all major AI coding agents:
Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.