fix-ci

by @llama-farm in AI & LLM

810

# Install this skill:

npx skills add llama-farm/llamafarm --skill "fix-ci"

Install specific skill from multi-skill repository

# Description

Fetch GitHub CI failure information, analyze root causes, reproduce locally, and propose a fix plan. Use `/fix-ci` for current branch or `/fix-ci <run-id>` for a specific run.

# SKILL.md

name: fix-ci
description: Fetch GitHub CI failure information, analyze root causes, reproduce locally, and propose a fix plan. Use /fix-ci for current branch or /fix-ci <run-id> for a specific run.
allowed-tools: Bash, Read, Grep, Glob, Task, AskUserQuestion, EnterPlanMode

Fix CI Skill

Automates CI troubleshooting by fetching GitHub Actions failures, analyzing logs, reproducing issues locally, and creating a fix plan for user approval.

Execution Workflow

Step 1: Prerequisites Check

Verify the GitHub CLI is installed and authenticated:

gh --version && gh auth status

If gh is not installed:
- Inform user: "GitHub CLI is required. Install with: brew install gh"
- Exit gracefully

If not authenticated:
- Inform user: "Please authenticate with: gh auth login"
- Exit gracefully

Step 2: Parse Arguments

Determine the mode based on arguments:

No arguments (/fix-ci): Fetch failures for the current branch only
With run-id (/fix-ci <run-id>): Fetch specific run (bypasses branch scoping)

Step 3: Fetch Failed Run

Default mode (current branch):

BRANCH=$(git branch --show-current)
gh run list --branch "$BRANCH" --status failure --limit 1 --json databaseId,name,headBranch,workflowName,createdAt

Specific run mode:

gh run view <run-id> --json databaseId,name,headBranch,workflowName,jobs,conclusion

If no failures found:
- Report: "No failed runs found for branch $BRANCH. CI is green!"
- Optionally show recent successful runs:

gh run list --branch "$BRANCH" --limit 3 --json databaseId,conclusion,workflowName,createdAt

Exit gracefully

Step 4: Get Failure Details

Once a failed run is identified, gather comprehensive details:

RUN_ID=<the-run-id>

# Get failed jobs with their steps
gh run view $RUN_ID --json jobs --jq '.jobs[] | select(.conclusion == "failure") | {name, conclusion, steps: [.steps[] | select(.conclusion == "failure")]}'

# Get failed step logs (critical for debugging)
gh run view $RUN_ID --log-failed 2>&1 | head -500

# Get verbose run info
gh run view $RUN_ID --verbose

Log handling:
- Truncate logs to 500 lines to avoid context overflow
- Note to user: "Showing first 500 lines of failed logs. Full logs available on GitHub."

Step 5: Download Artifacts (if available)

Attempt to download any debug artifacts:

# Try common artifact names - failures are OK (not all runs have artifacts)
gh run download $RUN_ID -n "coverage" -D /tmp/ci-debug/ 2>/dev/null || true
gh run download $RUN_ID -n "test-results" -D /tmp/ci-debug/ 2>/dev/null || true
gh run download $RUN_ID -n "logs" -D /tmp/ci-debug/ 2>/dev/null || true

If artifacts downloaded, read them for additional context.

Step 6: Analyze Failure Type

Categorize the failure based on log patterns:

Pattern	Failure Type	Root Cause Area
`FAIL:`, `--- FAIL`, `FAILED`	Test Failure	Specific test case
`ruff check`, `ruff format`	Lint Error	Code style/formatting
`ModuleNotFoundError`, `ImportError`	Import Error	Missing dependency
`TypeError`, `AttributeError`	Runtime Error	Type mismatch
`SyntaxError`	Syntax Error	Invalid code
`AssertionError`	Assertion Failure	Test expectation mismatch
`TimeoutError`, `timed out`	Timeout	Performance/hang
`PermissionError`, `EACCES`	Permission Error	File/resource access
`ConnectionError`, `ECONNREFUSED`	Network Error	External service

Extract key information:
- Failed test name/file (if applicable)
- Error message
- Stack trace location (file:line)
- Environment variables or config issues

Step 7: Map to Local Test Commands

Determine the appropriate local command based on the CI job:

CI Workflow/Job	Local Command
`test-cli`	`cd cli && go test ./...`
`test-python` (server)	`cd server && uv run pytest -v`
`test-python` (rag)	`cd rag && uv run pytest -v`
`test-python` (config)	`cd config && uv run pytest -v`
`test-python` (runtime)	`cd runtimes/universal && uv run pytest -v`
`lint` (python)	`uv run ruff check .`
`lint` (go)	`cd cli && golangci-lint run`
`type-check`	`uv run mypy .`
`build-cli`	`nx build cli`
`build-designer`	`cd designer && npm run build`

For specific test failures, narrow down the command:
- Python: cd <dir> && uv run pytest -v <test_file>::<test_name>
- Go: cd cli && go test -v -run <TestName> ./...

Step 8: Reproduce Locally

Run the mapped local command to confirm the failure reproduces:

# Example for Python test
cd server && uv run pytest -v tests/test_api.py::test_health_check

Outcome A - Failure reproduces locally:
- Good! Continue to fix plan
- Report: "Successfully reproduced failure locally"

Outcome B - Failure does NOT reproduce locally:
- Note: "Could not reproduce locally. Possible causes:"
- Flaky test (timing-dependent)
- Environment difference (CI has different deps/config)
- Race condition
- Suggest: "Consider re-running CI with gh run rerun $RUN_ID"
- Ask user how to proceed (investigate further or skip)

Step 9: Analyze Root Cause

Based on the failure type and logs, identify:

What failed: Specific test, lint rule, or build step
Why it failed: The actual error condition
Where to fix: File(s) and line(s) that need changes
How to fix: Proposed changes

Use available tools to explore:
- Read the failing test file
- Read the code being tested
- Search for related patterns in the codebase
- Check recent changes that might have caused the failure

Step 10: Enter Plan Mode

Use EnterPlanMode to create a formal fix plan. The plan should include:

# CI Fix Plan

## Problem Statement
[Summary of the CI failure from logs]

## Failure Details
- **Run ID**: <run-id>
- **Workflow**: <workflow-name>
- **Job**: <job-name>
- **Error Type**: <categorized-type>

## Root Cause Analysis
[Explanation of why the failure occurred]

## Affected Files
- `path/to/file1.py` (line X)
- `path/to/file2.py` (line Y)

## Proposed Changes

### Change 1: [Brief description]
[Specific edit to make]

### Change 2: [Brief description]
[Specific edit to make]

## Verification Steps
1. Run: `<local-test-command>`
2. Expected: All tests pass
3. Optional: Run full test suite with `<full-suite-command>`

## Notes
- [Any caveats or considerations]

Step 11: User Approval Gate

Present the plan and wait for explicit user approval:
- User approves: Proceed to execute fixes
- User modifies: Incorporate feedback, update plan
- User rejects: Exit gracefully without changes

CRITICAL: Never make code changes without user approval.

Step 12: Execute Fix (after approval only)

Make the proposed code changes using Edit tool
Run local tests to verify the fix:

<local-test-command>

Report results:
Success: "Fix verified locally. Tests pass."
Failure: "Fix did not resolve the issue. [details]"

IMPORTANT: Do NOT auto-commit changes. Leave committing to the user or /commit-push-pr skill.

Error Handling

Scenario	Action
gh CLI not installed	Direct user to install: `brew install gh`
gh not authenticated	Direct user to: `gh auth login`
No failures found	Report CI is green, exit gracefully
Rate limit exceeded	Suggest waiting or using `gh auth refresh`
Run not found	Verify run ID, suggest `gh run list` to find valid IDs
Large logs (>500 lines)	Truncate, note full logs on GitHub
Local reproduction fails	Note as flaky/env issue, offer re-run option
Network errors	Suggest retry, check connection

Output Format

On finding a failure:

CI Failure Found
Run: #12345 (workflow-name)
Branch: feature-branch
Failed Job: test-python
Error Type: Test Failure

Analyzing logs...
[Summary of failure]

Reproducing locally...
[Result]

Entering plan mode to propose fix...

On success (after fix):

Fix Applied
- Modified: path/to/file.py
- Verification: Tests pass locally

Next steps:
- Review the changes
- Run `/commit-push-pr` to commit and push
- CI will re-run automatically on push

Notes for the Agent

Always scope to current branch by default - Users expect /fix-ci to fix their current work, not random failures
Truncate logs wisely - CI logs can be huge; extract the relevant error sections
Reproduce before fixing - Don't propose fixes for issues that can't be reproduced
Plan mode is mandatory - Always use EnterPlanMode before making changes
Never auto-commit - The user controls when changes are committed
Be specific in analysis - Generic advice isn't helpful; identify exact files and lines
Handle flaky tests - If reproduction fails, acknowledge it might be flaky

# Supported AI Coding Agents

This skill is compatible with the SKILL.md standard and works with all major AI coding agents:

⚡ Amp 🚀 Antigravity 🤖 Claude Code 🦀 Clawdbot 📝 Codex ▶️ Cursor 🤖 Droid 💎 Gemini CLI 🐙 GitHub Copilot 🪿 Goose 📊 Kilo Code 🔧 Kiro CLI 💻 OpenCode 🦘 Roo Code 🌲 Trae 🏄 Windsurf

Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.