answering-natural-language-questions-with-dbt

by @dbt-labs in AI & LLM

# Install this skill:

npx skills add dbt-labs/dbt-agent-skills --skill "answering-natural-language-questions-with-dbt"

Install specific skill from multi-skill repository

# Description

Use when a user asks a business question that requires querying data (e.g., "What were total sales last quarter?"). NOT for validating, testing, or building dbt models during development.

# SKILL.md

name: answering-natural-language-questions-with-dbt
description: Use when a user asks a business question that requires querying data (e.g., "What were total sales last quarter?"). NOT for validating, testing, or building dbt models during development.
user-invocable: false
metadata:
author: dbt-labs

Answering Natural Language Questions with dbt

Overview

Answer data questions using the best available method: semantic layer first, then SQL modification, then model discovery, then manifest analysis. Always exhaust options before saying "cannot answer."

Use for: Business questions from users that need data answers
- "What were total sales last month?"
- "How many active customers do we have?"
- "Show me revenue by region"

Not for:
- Validating model logic during development
- Testing dbt models or semantic layer definitions
- Building or modifying dbt models
- dbt run, dbt test, or dbt build workflows

Decision Flow

flowchart TD
    start([Business question received])
    check_sl{Semantic layer tools available?}
    list_metrics[list_metrics]
    metric_exists{Relevant metric exists?}
    get_dims[get_dimensions]
    sl_sufficient{SL can answer directly?}
    query_metrics[query_metrics]
    answer([Return answer])
    try_compiled[get_metrics_compiled_sql<br/>Modify SQL, execute_sql]
    check_discovery{Model discovery tools available?}
    try_discovery[get_mart_models<br/>get_model_details<br/>Write SQL, execute]
    check_manifest{In dbt project?}
    try_manifest[Analyze manifest/catalog<br/>Write SQL]
    cannot([Cannot answer])
    suggest{In dbt project?}
    improvements[Suggest semantic layer changes]
    done([Done])

    start --> check_sl
    check_sl -->|yes| list_metrics
    check_sl -->|no| check_discovery
    list_metrics --> metric_exists
    metric_exists -->|yes| get_dims
    metric_exists -->|no| check_discovery
    get_dims --> sl_sufficient
    sl_sufficient -->|yes| query_metrics
    sl_sufficient -->|no| try_compiled
    query_metrics --> answer
    try_compiled -->|success| answer
    try_compiled -->|fail| check_discovery
    check_discovery -->|yes| try_discovery
    check_discovery -->|no| check_manifest
    try_discovery -->|success| answer
    try_discovery -->|fail| check_manifest
    check_manifest -->|yes| try_manifest
    check_manifest -->|no| cannot
    try_manifest -->|SQL ready| answer
    answer --> suggest
    cannot --> done
    suggest -->|yes| improvements
    suggest -->|no| done
    improvements --> done

Quick Reference

Priority	Condition	Approach	Tools
1	Semantic layer active	Query metrics directly	`list_metrics`, `get_dimensions`, `query_metrics`
2	SL active but minor modifications needed (missing dimension, custom filter, case when, different aggregation)	Modify compiled SQL	`get_metrics_compiled_sql`, then `execute_sql`
3	No SL, discovery tools active	Explore models, write SQL	`get_mart_models`, `get_model_details`, then `show`/`execute_sql`
4	No MCP, in dbt project	Analyze artifacts, write SQL	Read `target/manifest.json`, `target/catalog.json`

Approach 1: Semantic Layer Query

When list_metrics and query_metrics are available:

list_metrics - find relevant metric
get_dimensions - verify required dimensions exist
query_metrics - execute with appropriate filters

If semantic layer can't answer directly (missing dimension, need custom logic) → go to Approach 2.

Approach 2: Modified Compiled SQL

When semantic layer has the metric but needs minor modifications:

Missing dimension (join + group by)
Custom filter not available as a dimension
Case when logic for custom categorization
Different aggregation than what's defined
get_metrics_compiled_sql - get the SQL that would run (returns raw SQL, not Jinja)
Modify SQL to add what's needed
execute_sql to run the raw SQL
Always suggest updating the semantic model if the modification would be reusable

-- Example: Adding sales_rep dimension
WITH base AS (
    -- ... compiled metric logic (already resolved to table names) ...
)
SELECT base.*, reps.sales_rep_name
FROM base
JOIN analytics.dim_sales_reps reps ON base.rep_id = reps.id
GROUP BY ...

-- Example: Custom filter
SELECT * FROM (compiled_metric_sql) WHERE region = 'EMEA'

-- Example: Case when categorization
SELECT
    CASE WHEN amount > 1000 THEN 'large' ELSE 'small' END as deal_size,
    SUM(amount)
FROM (compiled_metric_sql)
GROUP BY 1

Note: The compiled SQL contains resolved table names, not {{ ref() }}. Work with the raw SQL as returned.

Approach 3: Model Discovery

When no semantic layer but get_all_models/get_model_details available:

get_mart_models - start with marts, not staging
get_model_details for relevant models - understand schema
Write SQL using {{ ref('model_name') }}
show --inline "..." or execute_sql

Prefer marts over staging - marts have business logic applied.

Approach 4: Manifest/Catalog Analysis

When in a dbt project but no MCP server:

Check for target/manifest.json and target/catalog.json
Filter before reading - these files can be large

# Find mart models in manifest
jq '.nodes | to_entries | map(select(.key | startswith("model.") and contains("mart"))) | .[].value | {name: .name, schema: .schema, columns: .columns}' target/manifest.json

# Get column info from catalog
jq '.nodes["model.project_name.model_name"].columns' target/catalog.json

Write SQL based on discovered schema
Explain: "This SQL should run in your warehouse. I cannot execute it without database access."

Suggesting Improvements

When in a dbt project, suggest semantic layer changes after answering (or when cannot answer):

Gap	Suggestion
Metric doesn't exist	"Add a metric definition to your semantic model"
Dimension missing	"Add `dimension_name` to the dimensions list in the semantic model"
No semantic layer	"Consider adding a semantic layer for this data"

Stay at semantic layer level. Do NOT suggest:
- Database schema changes
- ETL pipeline modifications
- "Ask your data engineering team to..."

Rationalizations to Resist

You're Thinking...	Reality
"Semantic layer doesn't support this exact query"	Get compiled SQL and modify it (Approach 2)
"No MCP tools, can't help"	Check for manifest/catalog locally
"User needs this quickly, skip the systematic check"	Systematic approach IS the fastest path
"Just write SQL, it's faster"	Semantic layer exists for a reason - use it first
"The dimension doesn't exist in the data"	Maybe it exists but not in semantic layer config

Red Flags - STOP

Writing SQL without checking if semantic layer can answer
Saying "cannot answer" without trying all 4 approaches
Suggesting database-level fixes for semantic layer gaps
Reading entire manifest.json without filtering
Using staging models when mart models exist
Using this to validate model correctness rather than answer business questions

Common Mistakes

Mistake	Fix
Giving up when SL can't answer directly	Get compiled SQL and modify it
Querying staging models	Use `get_mart_models` first
Reading full manifest.json	Use jq to filter
Suggesting ETL changes	Keep suggestions at semantic layer
Not checking tool availability	List available tools before choosing approach

# Supported AI Coding Agents

This skill is compatible with the SKILL.md standard and works with all major AI coding agents:

⚡ Amp 🚀 Antigravity 🤖 Claude Code 🦀 Clawdbot 📝 Codex ▶️ Cursor 🤖 Droid 💎 Gemini CLI 🐙 GitHub Copilot 🪿 Goose 📊 Kilo Code 🔧 Kiro CLI 💻 OpenCode 🦘 Roo Code 🌲 Trae 🏄 Windsurf

Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.