dbt-labs

building-dbt-semantic-layer

3
0
# Install this skill:
npx skills add dbt-labs/dbt-agent-skills --skill "building-dbt-semantic-layer"

Install specific skill from multi-skill repository

# Description

Use when creating or modifying dbt Semantic Layer components including semantic models, metrics, and dimensions leveraging MetricFlow.

# SKILL.md


name: building-dbt-semantic-layer
description: Use when creating or modifying dbt Semantic Layer components including semantic models, metrics, and dimensions leveraging MetricFlow.
user-invocable: false
metadata:
author: dbt-labs


Building the dbt Semantic Layer

This skill guides the creation and modification of dbt Semantic Layer components: semantic models, entities, dimensions, and metrics.

  • Semantic models - Metadata configurations that define how dbt models map to business concepts
  • Entities - Keys that identify the grain of your data and enable joins between semantic models
  • Dimensions - Attributes used to filter or group metrics (categorical or time-based)
  • Metrics - Business calculations defined on top of semantic models (e.g., revenue, order count)

Additional Resources

  • Time Spine Setup - Required for time-based metrics and aggregations
  • Best Practices - Design patterns and recommendations for semantic models and metrics

[!NOTE]
This skill contains guidance for the new dbt semantic layer YAML spec, valid for dbt 1.12.0 and above. If the user is using a different version of dbt, you can use the migration guide to help them migrate to the new spec and add new components to their semantic layer. Ask the user if they want to migrate to the new spec before proceeding.

Entry Points

Users may ask questions related to building metrics with the semantic layer in a few different ways. Here are the common entry points to look out for:

Business Question First

When the user describes a metric or analysis need (e.g., "I need to track customer lifetime value by segment"):

  1. Search project models or existing semantic models by name, description, and column names for relevant candidates
  2. Present top matches with brief context (model name, description, key columns)
  3. User confirms which model(s) / semantic models to build on / extend / update
  4. Work backwards from users need to define entities, dimensions, and metrics

Model First

When the user specifies a model to expose (e.g., "Add semantic layer to customers model"):

  1. Read the model SQL and existing YAML config
  2. Identify the grain (primary key / entity)
  3. Suggest dimensions based on column types and names
  4. Ask what metrics the user wants to define

Both paths converge on the same implementation workflow.

Open Ended

User asks to build the semantic layer for a project or models that are not specified. ("Build the semantic layer for my project")

  1. Identify high importance models in the project
  2. Suggest some metrics and dimensions for those models
  3. Ask the user if they want to create more metrics and dimensions or if there are any other models they want to build the semantic layer on

Implementation Workflow

Step 1: Enable Semantic Model

Decide which dbt model to build the semantic model on. Add semantic_model: block to the model's YAML with enabled: true. Set agg_time_dimension to the primary time column. If the model does not have a time column, warn user that the model cannot contain metrics that are time-based. Ask the user if they want to create a derived time dimension.

Example YAML:

models:
  - name: orders
    semantic_model:
      enabled: true # enable the semantic model
    agg_time_dimension: ordered_at # set the primary time column (this is a column in the dbt model)

Step 2: Define Entities

Identify the primary key column (check for _id suffix, uniqueness tests, or explicit config). Add entity: \n\t type: primary block to that column's entry. If the model has foreign keys, define those as entity: type: foreign.

models:
  - name: orders
    semantic_model:
      enabled: true # enable the semantic model
    agg_time_dimension: ordered_at # set the primary time column (this is a column in the dbt model)
    columns:
      - name: order_id # this is the primary key column of the model
        entity: 
          type: primary 
          name: order
      - name: customer_id # this is a foreign key column of the model
        entity: 
          type: foreign 
          name: customer

Step 3: Define Dimensions

Scan columns for dimension candidates. These would be useful columns to group by when querying a metrics:
- Time columns → dimension: type: time with appropriate granularity (set at the column level)
- Categorical columns (strings, booleans) → dimension: type: categorical

Present suggested dimensions to user for confirmation.

Example YAML:

models:
  - name: orders
    semantic_model:
      enabled: true # enable the semantic model
    agg_time_dimension: ordered_at # set the primary time column (this is a column in the dbt model)
    columns:

      - name: order_id
        entity: 
          type: primary 
          name: order

      - name: customer_id
        entity: 
          type: foreign 
          name: customer

      - name: ordered_at
        granularity: day # set the granularity of the time column
        dimension:
          type: time

      - name: order_status
        dimension:
          type: categorical

Step 4: Define Metrics

Create some simple metrics for the model. For each metric, collect: name, description, label, aggregation type, and expression. Support metric types: simple, derived, cumulative, conversion, ratio.

models:
  - name: orders
    semantic_model:
      enabled: true # enable the semantic model
    agg_time_dimension: ordered_at # set the primary time column (this is a column in the dbt model)
    columns:

      - name: order_id
        entity: 
          type: primary 
          name: order

      - name: customer_id
        entity: 
          type: foreign 
          name: customer

      - name: ordered_at
        granularity: day # set the granularity of the time column
        dimension:
          type: time

      - name: order_status
        dimension:
          type: categorical

    metrics:
      - name: order_count
        type: simple
        agg: count
        expr: 1

      - name: total_revenue
        type: simple
        agg: sum
        expr: amount

      - name: average_order_value
        type: simple
        agg: avg
        expr: amount

Validation

After writing YAML, validate in two stages:

  1. Parse Validation: Run dbt parse to confirm YAML syntax and references
  2. Semantic Layer Validation:
  3. dbt sl validate (dbt Cloud CLI / dbt Fusion)
  4. mf validate-configs (MetricFlow CLI)

Do not consider work complete until both validations pass.

Editing Existing Components

When modifying existing semantic layer config:

  • Check if the model's YAML already has semantic_model: block
  • Read existing entities, dimensions, and metrics before making changes
  • Preserve all existing YAML content not being modified
  • After edits, run full validation to ensure nothing broke

YAML Format Reference

Derived Dimensions and Entities

if the user wants to create a derived dimension or entity that is not a column within the dbt model, then we can use the derived_semantics block.

    derived_semantics:
      dimensions:
        - name: order_size_bucket
          type: categorical
          expr: case when amount > 100 then 'large' else 'small' end
          label: "Order Size"

      entities:
        - name: order_customer_key
          type: foreign
          expr: "order_id || '-' || customer_id"

Advanced Metric Examples

All simple metrics are defined at the model level under the metrics key. Advanced metrics that refer to simple metrics within the same model are defined within a model's YAML entry the models.metrics key. Advanced metrics that refer to simple metrics across different models are defined at the top level under the metrics key.

Derived Metrics

      - name: revenue_per_order
        type: derived
        description: Average revenue per order
        label: Revenue per Order
        expr: total_revenue / total_orders
        input_metrics:
          - name: total_revenue
          - name: total_orders

      # With offset window
      - name: revenue_growth
        type: derived
        expr: total_revenue - revenue_last_week
        input_metrics:
          - name: total_revenue
          - name: total_revenue
            alias: revenue_last_week
            offset_window: 1 week
            filter: "{{ Dimension('order__status') }} = 'completed'"

Cumulative Metrics

      - name: cumulative_revenue
        type: cumulative
        description: Running total of revenue
        label: Cumulative Revenue
        input_metric: total_revenue
        grain_to_date: week
        period_agg: first

      # With window
      - name: trailing_7d_revenue
        type: cumulative
        input_metric: total_revenue
        window: 7 days

Ratio Metrics

      - name: conversion_rate
        type: ratio
        description: Orders divided by visits
        label: Conversion Rate
        numerator: total_orders
        denominator: total_visits

      # With filters
      - name: premium_conversion_rate
        type: ratio
        numerator:
          name: total_orders
          filter: "{{ Dimension('order__customer_segment') }} = 'premium'"
          alias: premium_orders
        denominator: total_visits

Conversion Metrics

      - name: signup_to_purchase
        type: conversion
        description: Rate of signups converting to purchase
        label: Signup to Purchase
        entity: customer
        calculation: conversion_rate
        base_metric: signups
        conversion_metric: purchases
        window: 7 days
        constant_properties:
          - base_property: signup_channel
            conversion_property: purchase_channel

Top-level Metrics (Cross-model)

# For metrics depending on multiple semantic models
metrics:
  - name: cross_model_ratio
    type: ratio
    numerator:
      name: metric_from_model_a
      filter: "{{ Dimension('entity__dim') }} > 10"
    denominator:
      name: metric_from_model_b
    config:
      group: example_group
      tags:
        - example_tag
      meta:
        owner: "@someone"

Filtering Metrics

Filters can be added to simple metrics or metric inputs to advanced metrics. The format of a filters is a Jinja template that can reference entities, dimensions, and metrics, a boolean operator, and a value. Ensure the value matches the type of the column being filtered.

Examples

filter: |
{{ Entity('entity_name') }} = 'value'

filter: |
{{ Dimension('primary_entity__dimension_name') }} > 100

filter: |
{{ TimeDimension('time_dimension', 'granularity') }} > '2026-01-01'

filter: |
{{ Metric('metric_name', group_by=['entity_name']) }} > 100

Key Formatting Rules

  • semantic_model: block at model level with enabled: true
  • agg_time_dimension: at model level (not nested under semantic_model)
  • entity: and dimension: on columns (can use shorthand or full form)
  • granularity: required at column level for time dimensions
  • metrics: array at model level for single-model metrics
  • Top-level metrics: key for cross-model metrics (derived, ratio, cumulative, conversion only)

Best Practices

  • Start with entities - Identify the grain before defining dimensions or metrics
  • Use shorthand where possible - entity: primary instead of full nested form for simple cases
  • Name metrics for business users - Use clear label values for non-technical users
  • Keep metrics close to their data - Simple metrics on their semantic model; top-level only for cross-model
  • Set appropriate granularity - Match the actual data grain (usually day)

Common Pitfalls

Pitfall Fix
Missing agg_time_dimension Every semantic model needs a default time dimension
granularity inside dimension: block Must be at column level
Both entity and dimension on same column A column can only be one or the other
Simple metrics in top-level metrics: Top-level is only for cross-model metrics
Using window and grain_to_date together Cumulative metrics can only have one
Missing input_metrics on derived metrics Must list metrics used in expr

# Supported AI Coding Agents

This skill is compatible with the SKILL.md standard and works with all major AI coding agents:

Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.