digitalocean-labs

app-platform-troubleshooting

2
1
# Install this skill:
npx skills add digitalocean-labs/do-app-platform-skills --skill "app-platform-troubleshooting"

Install specific skill from multi-skill repository

# Description

Debug running App Platform applications by accessing containers, analyzing logs, running diagnostics, and applying fixes. Use when apps fail to deploy, crash at runtime, have connectivity issues, or need performance diagnosis.

# SKILL.md


name: app-platform-troubleshooting
version: 1.0.0
min_doctl_version: "1.82.0"
description: Debug running App Platform applications by accessing containers, analyzing logs, running diagnostics, and applying fixes. Use when apps fail to deploy, crash at runtime, have connectivity issues, or need performance diagnosis.
related_skills: [deployment, networking, postgres]
deprecated: false


App Platform Troubleshooting Skill

Transform debugging from guessing to rapid diagnosis and fix.

Philosophy

Traditional: See error → Guess → Change → Push → Wait 5-7 min → Repeat
With skill:  See error → Diagnose → Fix → Verify → Commit proper fix

Quick Decision

Is the app deployed with running containers?
├── YES → Can we access the shell?
│         ├── YES → LIVE MODE (SDK shell access)
│         └── NO  → LOGS-ONLY MODE (fetch logs)
└── NO (build/deploy failed) → LOGS-ONLY MODE

Mode 1: Live Troubleshooting (Quick Start)

from do_app_sandbox import Sandbox

app = Sandbox.get_from_id(app_id="<app-id>", component="web")

# Diagnostics
app.exec("env | grep DATABASE")
app.exec("curl -v localhost:8080/health")
app.exec("ps aux | head -10")

Full guide: See live-troubleshooting.md


Mode 2: Logs-Only (Quick Start)

# Runtime logs
doctl apps logs <app_id> <component> --type run

# Build logs
doctl apps logs <app_id> <component> --type build

# Crash logs
doctl apps logs <app_id> --type=run_restarted

Full guide: See logs-analysis.md


Debug Container (Infrastructure Issues)

Deploy in ~30-45 seconds to isolate infrastructure from application:

services:
  - name: debug
    image:
      registry_type: GHCR
      registry: ghcr.io
      repository: bikramkgupta/debug-python
      tag: latest
    http_port: 8080
    envs:
      - key: DATABASE_URL
        value: ${db.DATABASE_URL}
# Run validation suite
validate-infra all
validate-infra database
validate-infra kafka

Full guide: See debug-container.md


Quick Reference: Exit Codes

Code Signal Meaning
0 - Clean exit (shouldn't exit)
1 - General error
127 - Command not found
137 SIGKILL OOM killed
143 SIGTERM Graceful shutdown

Quick Reference: Common Fixes

Problem Quick Fix
App exits immediately Check if listening on $PORT
502 errors Check health endpoint, verify running
Database connection fails Use Debug Container, verify trusted sources
Build fails Check dependencies, review build logs
OOM kills Upgrade instance size
Health checks fail Bind to 0.0.0.0, not localhost
Slow startup Increase initial_delay_seconds

Reference Files


When to Escalate

Contact DigitalOcean Support when:
- Internal error persists after redeploy
- Resource limit increases needed
- Multiple apps affected (platform issue)
- VPC/networking issues can't be diagnosed

Before escalating, gather:

doctl apps get <app_id> -o json > app_info.json
doctl apps logs <app_id> <component> --type run > runtime.log
doctl apps spec get <app_id> > app_spec.yaml

Integration with Other Skills

  • → deployment: After fixing, deploy proper changes
  • → devcontainers: Reproduce issues locally
  • → postgres: Database-specific configuration
  • → networking: Comprehensive networking docs

# Supported AI Coding Agents

This skill is compatible with the SKILL.md standard and works with all major AI coding agents:

Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.