The ability to use different warehouse sizes for different workloads in Snowflake provides enormous value for performance and cost optimization. dbt natively integrates with Snowflake to allow specific warehouses to be chosen down to the model level. In this post, we explain exactly how to use this feature and share some best practices.
models: +snowflake_warehouse: 'dbt_xsmall' my_project: clickstream: +snowflake_warehouse: 'dbt_large'
### Dynamic warehouse name based on environment In many cases, the warehouse we want to use in production differs from the one that we might want to use when developing, or in a CI or automated testing workflow. The same goes for the warehouses we’re now configuring for individual models. To do that, we can use a macro in place of the literal warehouse value we have before: ```sql {{ config( snowflake_warehouse=get_warehouse('large') ) }} select ... from {{ ref('stg_orders') }}
This macro can implement logic to return the desired warehouse size for the environment.
Suppose that in production, we have created warehouses named dbt_production_<size>
(dbt_production_xsmall
, dbt_production_small
, dbt_production_medium
etc.), and for CI, we have warehouses named dbt_ci_<size>
. For local development, we just want to use the default warehouse, and ignore the configured warehouse size altogether. We also want to raise an error if the chosen warehouse size isn’t available in a managed list. We can do that with the following macro logic:
{% markdoc-example %}
{% macro get_warehouse(size) %} {% set available_sizes = ['xsmall', 'small', 'medium', 'large', 'xlarge', '2xlarge'] %} {% if size not in available_sizes %} {{ exceptions.raise_compiler_error("Warehouse size not one of " ~ valid_warehouse_sizes) }} {% endif %} {% if target.name in ('production', 'prod') %} {% do return('dbt_production_' ~ size) %} {% elif target.name in ('ci') %} {% do return('dbt_ci_' ~ size) %} {% else %} {% do return(None) %} {% endif %} {% endmacro %}
{% /markdoc-example %}
Using a macro for the snowflake_warehouse
config only works in model files, and cannot be used in the dbt_project.yml
.
It’s currently only possible to configure warehouses for models and snapshots. If you’re interested in keeping up to date on support for other resources such as tests, take a look at this GitHub issue.
Check out our dbt_snowflake_monitoring dbt package which provides an easy-to-use dbt_queries
model for understanding the performance of dbt models over time. It attributes costs to individual model runs, which makes it easy to answer questions like “what are the 10 costliest models in the last month?”.
Thanks for reading! In an upcoming post we’ll share more recommendations for optimizing dbt performance on Snowflake. Make sure to subscribe for notifications on future posts, and feel free to reach out if you have any questions!
Niall is the Co-Founder & CTO of SELECT, a SaaS Snowflake cost management and optimization platform. Prior to starting SELECT, Niall was a data engineer at Brooklyn Data Company and several startups. As an open-source enthusiast, he's also a maintainer of SQLFluff, and creator of three dbt packages: dbt_artifacts
, dbt_snowflake_monitoring
and dbt_snowflake_query_tags
.
Get up and running with SELECT in 15 minutes.
Gain visibility into Snowflake usage, optimize performance and automate savings with the click of a button.