astronomer

authoring-dags

11
0
# Install this skill:
npx skills add astronomer/agents --skill "authoring-dags"

Install specific skill from multi-skill repository

# Description

Workflow and best practices for writing Apache Airflow DAGs. Use when the user wants to create a new DAG, write pipeline code, or asks about DAG patterns and conventions. For testing and debugging DAGs, see the testing-dags skill.

# SKILL.md


name: authoring-dags
description: Workflow and best practices for writing Apache Airflow DAGs. Use when the user wants to create a new DAG, write pipeline code, or asks about DAG patterns and conventions. For testing and debugging DAGs, see the testing-dags skill.
hooks:
Stop:
- hooks:
- type: command
command: "echo 'Remember to test your DAG with the testing-dags skill'"


DAG Authoring Skill

This skill guides you through creating and validating Airflow DAGs using best practices and MCP tools.

For testing and debugging DAGs, see the testing-dags skill which covers the full test β†’ debug β†’ fix β†’ retest workflow.


⚠️ CRITICAL WARNING: Use MCP Tools, NOT CLI Commands ⚠️

STOP! Before running ANY Airflow-related command, read this.

You MUST use MCP tools for ALL Airflow interactions. CLI commands like astro dev run, airflow dags, or shell commands to read logs are FORBIDDEN.

Why? MCP tools provide structured, reliable output. CLI commands are fragile, produce unstructured text, and often fail silently.


CLI vs MCP Quick Reference

ALWAYS use Airflow MCP tools. NEVER use CLI commands.

❌ DO NOT USE βœ… USE INSTEAD
astro dev run dags list list_dags MCP tool
airflow dags list list_dags MCP tool
astro dev run dags test trigger_dag_and_wait MCP tool
airflow tasks test trigger_dag_and_wait MCP tool
cat / grep on Airflow logs get_task_logs MCP tool
find in dags folder list_dags or explore_dag MCP tool
Any astro dev run ... Equivalent MCP tool
Any airflow ... CLI Equivalent MCP tool
ls on /usr/local/airflow/dags/ list_dags or explore_dag MCP tool
cat ... \| jq to filter MCP results Read the JSON directly from MCP response

Remember:
- βœ… Airflow is ALREADY running β€” the MCP server handles the connection
- ❌ Do NOT attempt to start, stop, or manage the Airflow environment
- ❌ Do NOT use shell commands to check DAG status, logs, or errors
- ❌ Do NOT use bash to parse or filter MCP tool results β€” read the JSON directly
- ❌ Do NOT use ls, find, or cat on Airflow container paths (/usr/local/airflow/...)
- βœ… ALWAYS use MCP tools β€” they return structured JSON you can read directly

Workflow Overview

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ 1. DISCOVER                         β”‚
β”‚    Understand codebase & environmentβ”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                 ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ 2. PLAN                             β”‚
β”‚    Propose structure, get approval  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                 ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ 3. IMPLEMENT                        β”‚
β”‚    Write DAG following patterns     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                 ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ 4. VALIDATE                         β”‚
β”‚    Check import errors, warnings    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                 ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ 5. TEST (with user consent)         β”‚
β”‚    Trigger, monitor, check logs     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                 ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ 6. ITERATE                          β”‚
β”‚    Fix issues, re-validate          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Phase 1: Discover

Before writing code, understand the context.

Explore the Codebase

Use file tools to find existing patterns:
- Glob for **/dags/**/*.py to find existing DAGs
- Read similar DAGs to understand conventions
- Check requirements.txt for available packages

Query the Airflow Environment

Use MCP tools to understand what's available:

Tool Purpose
list_connections What external systems are configured
list_variables What configuration values exist
list_providers What operator packages are installed
get_airflow_version Version constraints and features
list_dags Existing DAGs and naming conventions
list_pools Resource pools for concurrency

Example discovery questions:
- "Is there a Snowflake connection?" β†’ list_connections
- "What Airflow version?" β†’ get_airflow_version
- "Are S3 operators available?" β†’ list_providers


Phase 2: Plan

Based on discovery, propose:

  1. DAG structure - Tasks, dependencies, schedule
  2. Operators to use - Based on available providers
  3. Connections needed - Existing or to be created
  4. Variables needed - Existing or to be created
  5. Packages needed - Additions to requirements.txt

Get user approval before implementing.


Phase 3: Implement

Write the DAG following best practices (see below). Key steps:

  1. Create DAG file in appropriate location
  2. Update requirements.txt if needed
  3. Save the file

Phase 4: Validate

Use the Airflow MCP as a feedback loop. Do NOT use CLI commands.

Step 1: Check Import Errors

After saving, call the MCP tool (Airflow will have already parsed the file):

MCP tool: list_import_errors

  • If your file appears β†’ fix and retry
  • If no errors β†’ continue

Common causes: missing imports, syntax errors, missing packages.

Step 2: Verify DAG Exists

MCP tool: get_dag_details(dag_id="your_dag_id")

Check: DAG exists, schedule correct, tags set, paused status.

Step 3: Check Warnings

MCP tool: list_dag_warnings

Look for deprecation warnings or configuration issues.

Step 4: Explore DAG Structure

MCP tool: explore_dag(dag_id="your_dag_id")

Returns in one call: metadata, tasks, dependencies, source code.


Phase 5: Test

πŸ“˜ See the testing-dags skill for comprehensive testing guidance.

Once validation passes, test the DAG using the workflow in the testing-dags skill:

  1. Get user consent β€” Always ask before triggering
  2. Trigger and wait β€” Use trigger_dag_and_wait(dag_id, timeout=300)
  3. Analyze results β€” Check success/failure status
  4. Debug if needed β€” Use diagnose_dag_run and get_task_logs

Quick Test (Minimal)

# Ask user first, then:
trigger_dag_and_wait(dag_id="your_dag_id", timeout=300)

For the full test β†’ debug β†’ fix β†’ retest loop, see testing-dags.


Phase 6: Iterate

If issues found:
1. Fix the code
2. Check for import errors with list_import_errors MCP tool
3. Re-validate using MCP tools (Phase 4)
4. Re-test using the testing-dags skill workflow (Phase 5)

Never use CLI commands to check status or logs. Always use MCP tools.


MCP Tools Quick Reference

Phase Tool Purpose
Discover list_connections Available connections
Discover list_variables Configuration values
Discover list_providers Installed operators
Discover get_airflow_version Version info
Validate list_import_errors Parse errors (check first!)
Validate get_dag_details Verify DAG config
Validate list_dag_warnings Configuration warnings
Validate explore_dag Full DAG inspection

Testing tools β€” See the testing-dags skill for trigger_dag_and_wait, diagnose_dag_run, get_task_logs, etc.


Best Practices & Anti-Patterns

For detailed code examples and patterns, see reference/best-practices.md.

Key topics covered:
- TaskFlow API usage
- Credentials management (connections, variables)
- Provider operators
- Idempotency patterns
- Data intervals
- Task groups
- Setup/Teardown patterns
- Data quality checks
- Anti-patterns to avoid


  • testing-dags: For testing DAGs, debugging failures, and the test β†’ fix β†’ retest loop
  • debugging-dags: For troubleshooting failed DAGs
  • migrating-airflow-2-to-3: For migrating DAGs to Airflow 3

# Supported AI Coding Agents

This skill is compatible with the SKILL.md standard and works with all major AI coding agents:

Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.