build-debugging

Name: build-debugging
Author: mcncl

by @mcncl in AI & LLM

# Install this skill:

npx skills add mcncl/skill-buildkite --skill "build-debugging"

Install specific skill from multi-skill repository

# Description

# SKILL.md

name: build-debugging
description: |
Analyzes failed Buildkite builds to identify root causes. Use when user asks:
- "Why did build X fail?"
- "Debug this build"
- "What's wrong with my CI?"
- "Fix this build failure"
- "Help me understand this error"
- /buildkite:debug

Build Debugging

Analyze failed Buildkite builds to identify root causes and provide actionable fixes.

Available MCP Tools

Tool	Purpose
`buildkite_get_build`	Fetch build details including all jobs and their states
`buildkite_read_logs`	Get full log output for a specific job
`buildkite_search_logs`	Search for patterns within job logs
`buildkite_get_build_test_engine_runs`	Get Test Engine results for the build
`buildkite_get_failed_test_executions`	Get details of failed tests
`buildkite_list_artifacts_for_build`	List uploaded artifacts
`buildkite_get_artifact`	Download a specific artifact

Input Parsing

Parse build information from $ARGUMENTS or the user's message:

Input Format	Example
Full URL	`https://buildkite.com/org/pipeline/builds/123`
Build number	`123`
Pipeline + build	`my-pipeline#123` or `my-pipeline 123`
Description	"the latest failed build on main"

If no build specified, ask the user which build to debug.

Approach

Fetch the build with buildkite_get_build
Note the overall state, branch, commit, and message
Check if this is a retry or first attempt
Identify failed jobs in the jobs array
Look for state: "failed" or state: "timed_out"
Note job names/labels to understand what failed
Check job exit codes
Read logs with buildkite_read_logs for failed jobs
Focus on the last 50-100 lines where failures surface
Look for the FIRST error, not just the last (cascading failures are common)
Check test results if applicable
Use buildkite_get_build_test_engine_runs for Test Engine data
Use buildkite_get_failed_test_executions for failure details
Review artifacts for additional context
Test reports, coverage data, debug outputs

Common Failure Patterns

Exit Codes

Code	Meaning	Action
1	General error	Check command output
127	Command not found	Missing dependency or PATH issue
137	OOM killed (128+9)	Increase memory or optimize
143	SIGTERM (128+15)	Timeout or cancelled

Test Failures

Flaky tests: Check if same test passed on retry
Environment differences: Compare agent tags, env vars
Timing issues: Race conditions or async problems

Infrastructure Issues

Agent disconnected: Network or agent health
Timeout: Job exceeded timeout_in_minutes
No agents: Check queue and agent tags

Response Format

Summary: One-line description of what failed
Root Cause: What actually caused the failure
Evidence: Relevant log snippets (use code blocks)
Recommendation: Specific steps to fix
Prevention: How to avoid this in future (if applicable)

Example Interaction

User: Why did build 456 fail?

1. Fetch build 456 with buildkite_get_build
2. Find failed job: "Run Tests" with exit code 1
3. Read logs, find: "Error: Cannot find module 'lodash'"
4. Respond with root cause (missing dependency) and fix (add to package.json)

# Supported AI Coding Agents

This skill is compatible with the SKILL.md standard and works with all major AI coding agents:

⚡ Amp 🚀 Antigravity 🤖 Claude Code 🦀 Clawdbot 📝 Codex ▶️ Cursor 🤖 Droid 💎 Gemini CLI 🐙 GitHub Copilot 🪿 Goose 📊 Kilo Code 🔧 Kiro CLI 💻 OpenCode 🦘 Roo Code 🌲 Trae 🏄 Windsurf

Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.