Integrations

dbt

Feature overview

The SELECT dbt integration allows you better understand and optimize Snowflake spend attributed to dbt.

  • Dissect your dbt spend across a variety of different dimensions such as project, target, materialization and resource type.
  • Identify expensive models and tests, and understand how their cost and performance varies over time.
  • Drill into the execution history of any dbt resource and view query profile diagnostics for each run.
  • Receive tailored optimization recommendations to improve model cost and performance

dbt model timing chart

With SELECT's dbt integration, you can also understand the performance of each dbt run with our model timing chart, located on the Timeline tab of each dbt run page:

SELECT Snowflake dbt model run timing chart

A great feature of this chart is you can click straight into a given model's run to understand why it took the time it did using our version of the Snowflake query profile:

dbt runs view

Understand the performance and cost of your dbt project over time using the dbt runs view. Filter for specific commands, targets, users and more. Click into individual runs in the table or chart to view more detail and access the timing chart.

Setup

SELECT uses query tags and comment metadata to power this integration. To enable this integration, follow the configuration steps below.

Step 1: Update packages.yml

Add the following entry to your dbt project's packages.yml file in the root of the dbt project, creating the file if it doesn't exist:

packages:
- package: get-select/dbt_snowflake_query_tags
version: [">=2.0.0", "<3.0.0"]

Step 2: Adding the query tags

Option 1: If running dbt >= 1.2, simply configure the dispatch search order in dbt_project.yml.

dispatch:
- macro_namespace: dbt
search_order:
- <YOUR_PROJECT_NAME>
- dbt_snowflake_query_tags
- dbt

Option 2: If running dbt < 1.2, create a folder named macros in your dbt project's top level directory (if it doesn't exist). Inside, make a new file called query_tags.sql with the following content:

{% macro set_query_tag() -%}
{% do return(dbt_snowflake_query_tags.set_query_tag()) %}
{% endmacro %}
{% macro unset_query_tag(original_query_tag) -%}
{% do return(dbt_snowflake_query_tags.unset_query_tag(original_query_tag)) %}
{% endmacro %}

Step 3: Configure the query comments

Add the following config to dbt_project.yml.

query-comment:
comment: '{{ dbt_snowflake_query_tags.get_query_comment(node) }}'
append: true # Snowflake removes prefixed comments.

That's it! All dbt-issued queries will now be tagged.

Example dbt query metadata

This configuration adds metadata describing each dbt-issued query to a tag on the query. Once enabled, SELECT will begin to provide insights, analytics and recommendations on your dbt project. An example of the metadata added to each query is shown below:

{
"dbt_snowflake_query_tags_version": "2.0.0",
"app": "dbt",
"dbt_version": "1.4.0",
"project_name": "my_project",
"target_name": "dev",
"target_database": "dev",
"target_schema": "larry_goldings",
"invocation_id": "c784c7d0-5c3f-4765-805c-0a377fefcaa0",
"node_name": "orders",
"node_alias": "orders",
"node_package_name": "my_project",
"node_original_file_path": "models/staging/orders.sql",
"node_database": "dev",
"node_schema": "mart",
"node_id": "model.my_project.orders",
"node_resource_type": "model",
"materialized": "incremental",
"is_incremental": true
}

Step 4: All done!

That's all that's required for setup. The next time your dbt project runs, the generated queries will automatically be tagged with this metadata. SELECT will then process this query metadata and present it back to you in the web application.

Leveraging dbt meta & tags

Consider the example dbt_project.yml below, which implements a combination of dbt meta fields and tags.

# dbt_project.yml
name: 'select_dbt'
version: '1.0.0'
config-version: 2
models:
+meta:
team: 'data' # configures all models with team = 'data' by default
select_dbt: # when setting meta for specific paths, make sure to namespace with the project name
engineering:
+meta:
team: 'engineering'
marketing:
+meta:
team: 'marketing'
forecasts:
+meta:
team: 'finance'
+tags:
- "sensitive"
- "critical"
tests: # setting meta for singular tests (ones in .sql files)
+meta:
team: 'data' # configures all singular tests with team = 'data' by default
select_dbt: # when setting meta for specific paths, make sure to namespace with the project name
engineering:
+meta:
team: 'engineering'
...

The file structure for this dbt project would look something like what is shown below. All resources under the engineering directory would be get a team meta field added with the value engineering. Similarly, the one model in the forecasts directory would get a team field of finance added. It would also have tags: ["sensitive", "critical"].

.
├── dbt_project.yml
├── packages.yml
└── models/
├── engineering/
| ├── syncs.sql
| └── ai.sql
├── marketing/
| ├── attribition.sql
| └── page_clicks.sql
└── forecasts/
└── forecast.sql

To expose dbt meta fields in SELECT, head to the dbt settings and specify which fields you'd like to filter on. No such specification is required for dbt tags.

SELECT dbt meta keys settings

Once added, you can filter and slice by this metadata throught the UI!

Leverage dbt metadata in your Usage Groups

Looking to do cost chargebacks or showbacks using dbt metadata? You can now flexibly allocate costs using dbt meta fields and tags you've specified using SELECT's Usage Groups feature!

Accessing the dbt query metadata

If you would like to access the dbt query metadata created by our dbt package, you have two options.

Option 1: Snowflake Query History

You can access the dbt query metadata by parsing the query comments/tags our package adds by querying the Snowflake query history. Here is an example SQL script you can run in your account:

with
dbt_queries as (
select
start_time,
query_text,
execution_time/1000 as execution_time_s,
try_parse_json(query_tag) as dbt_query_tag_meta,
try_parse_json(regexp_substr(query_text, $$\/\*\s({"app":\s"dbt".*})\s\*\/$$, 1, 1, 'ie')) as dbt_comment_meta
from snowflake.account_usage.query_history
where
dbt_query_tag_meta is not null
and dbt_comment_meta is not null
and start_time > current_date - 7
)
select
start_time,
query_text,
execution_time_s,
dbt_comment_meta['invocation_id']::string as invocation_id,
dbt_comment_meta['node_id']::string as node_id,
dbt_comment_meta['node_name']::string as node_name,
dbt_comment_meta['node_resource_type']::string as resource_type,
dbt_comment_meta['project_name']::string as project_name,
dbt_comment_meta['target_name']::string as target_name,
dbt_comment_meta['target_database']::string as target_database,
dbt_comment_meta['target_schema']::string as target_schema,
dbt_comment_meta['node_package_name']::string as package_name,
dbt_comment_meta['materialized']::string as materialization,
dbt_query_tag_meta['is_incremental']::boolean as is_incremental
from dbt_queries
limit 50

Option 2: Via a data share

All data visible in the SELECT web application can be accessed through a secure data share, including a table named dbt_queries which includes all of the metadata (and more) shown above. See our data sharing documentation for more details.