Snowflake query tags for enhanced monitoring

Date: Wednesday, February 8, 2023

Ian Whitestone
Co-founder & CEO of SELECT

Snowflake query tags allow users to associate arbitrary metadata with each query. In this post, we show how you can use query tags to achieve better visibility & monitoring for your Snowflake query costs and performance.

What is a query tag in Snowflake?

Query tags are an optional session level parameter that allow users to tag any Snowflake SQL statement with a string. They can be up to 2000 characters in length, and contain any characters. Query tag values for each query can be found in the output of the query_history views in Snowflake, allowing them to be leveraged for a variety of use cases.

It's worth noting that query tags are different than object tags. Both accomplish the shared purpose of enabling more structured monitoring and improving visibility within your Snowflake account. However, object tags are used for persistent account objects like users, roles, tables, views, functions and more.

Why should you use query tags in Snowflake?

For most Snowflake users, compute costs from queries running in virtual warehouses will make up the vast majority of their Snowflake spend. While it is possible to attribute compute spend within a warehouse to different users by calculating cost per query, this is often not granular enough since a single production user account can generate the vast majority of queries, and costs.

Query tags enable more fine grained cost attribution. If you have a single SQL statement, or series of SQL statements associated with a data model in a pipeline, you can assign them the same query tag. Costs can then be easily attributed to all queries associated with the given tag. The alternative to this involves grouping by query_text, which does not allow for multiple related SQL statements to be bucketed together. It also falls apart when the SQL text for a given data model inevitably gets changed.

Query tags can also be used for more granular query performance monitoring. Sticking with the example from earlier, users may wish to monitor the total runtime for each data model by grouping together the total elapsed time for all associated queries. Alternatively, if a set of queries are used to power user facing application dashboards, leveraging query tags can allow for more targeted performance monitoring.

Last of all, query tags provide the ability to link queries with metadata from other systems. A query tag could contain a dashboard_id, which could enable users to aggregate all costs for a single dashboard, and then see how often that dashboard is used through the BI tool's metadata.

How do you use query tags in Snowflake?

Setting default tags

Query tags are a session level parameter, but defaults can be set at the account and user level. For example:

alter user shauna set query_tag = '{"team": "engineering", "user": "shauna"}';

Every query issued by this user will now have this default tag.

Setting query tags at the session level

Use the alter session command to set the query tag. After this command is run, all subsequent queries run in the same session will be tagged with that string.

alter session set query_tag='users_model';

-- this query will be tagged with 'users_model'
create or replace table users_tmp as (
  select *
  from raw_users
  where
    not deleted
    and created_at > current_date - 1
);

-- this will also be tagged with 'users_model'
insert into users
from users_tmp
;

We recommend using user defaults wherever possible, to avoid frequent alter session calls which add latency to the overall query execution.

Setting query tags in Python

If you are using Python to execute queries, there are two ways to set query tags.

Setting once upon connection creation

When you are creating your connection object with the Snowflake Python Connector, you can set any session parameters upfront. In the example below, all queries executed from this con object will be tagged with DATA_MODELLING_PIPELINE.

con = snowflake.connector.connect(
    user='XXXX',
    password='XXXX',
    account='XXXX',
    session_parameters={
        'QUERY_TAG': 'DATA_MODELLING_PIPELINE',
    }
)

Modifying manually through `alter session`

If you don't want all queries in your session to have the same tag, you can instead run alter session set query_tag = 'XXX' prior to running your actual queries.

con.cursor().execute("alter session set query_tag='users_model'")

query = """
create or replace table users_tmp as (
  select *
  from raw_users
  where
    not deleted
    and created_at > current_date - 1
)
"""

con.cursor().execute(query) # tagged with 'users_model'
con.cursor().execute("insert into users from users_tmp") # tagged with 'users_model'

Setting query tags in dbt

If you are using dbt, there are three options for setting query tags:

They can be set once in your profiles.yml (source). All queries run in your dbt project will then be tagged with that value.
Tags can be set for all models under a particular resource_path, or for a single model, by adding a +query_tag in your dbt_project.yml. For individual models, you can also specify the query tag in the model config, i.e. {{ config(query_tag = 'XXX') }}. If a default query tag has been set in profiles.yml, it will be overridden by any of these more precise tags.
You can create a set_query_tag macro which automatically sets the query tag to the model name for all models in your project.

Refer to the dbt documentation for examples of each of these options, and do make note of the potential failure mode they listed where queries can be set with an incorrect tag if specific failures occur upstream.

We have recently released a new dbt package dbt-snowflake-query-tags to tag all dbt-issued queries with a comprehensive set of metadata, check it out.

Using JSON strings

When setting query tags, we recommend using a JSON object for ease-of-use and consistency. Continuing with the data model tagging example from earlier, we could use a JSON object to add additional information like the environment the model ran in, the version, the trigger for the data model to run (was it a scheduled run or manually invoked?), and more.

import json

query_tag = {
  'app_name': 'pipeline',
  'model_name': 'users',
  'environment': 'prod',
  'version': 'v1.2',
  'trigger': 'schedule'
}
con.cursor().execute(f"alter session set query_tag='{json.dumps(query_tag)}'")

con.cursor().execute(model_sql)

How to use query tags for Snowflake cost & performance monitoring

Query tags are surfaced in the query history views for each query_id. Here's an example query that shows average query performance by query_tag:

If the query_tag contains a JSON object, it can be parsed and segmented by any of the keys. Using the example from above:

select
    query_tag,
    count(*) as num_executions,
    avg(total_elapsed_time/1000) as avg_total_elapsed_time_s
from snowflake.account_usage.query_history
where
    start_time > current_date - 7
group by 1

Using the dbt-snowflake-monitoring package

If you are using SELECT's dbt package for cost & performance monitoring, you can analyze cost by query tag in addition to the performance:

select
    try_parse_json(query_tag)['model_name']::string as model_name,
    count(*) as num_executions,
    avg(total_elapsed_time/1000) as avg_total_elapsed_time_s
from snowflake.account_usage.query_history
where
    try_parse_json(query_tag)['app_name']::string = 'pipeline'
    and start_time > current_date - 7
group by 1

You'll also find that these queries run much quicker than a query against Snowflake's account usage views, since the table is materialized and sorted by start_time to achieve a well clustered state.

Using query comments instead of query tags

Another common practice for tagging queries is to add a comment to the bottom¹ of each query:

select
    try_parse_json(query_tag)['model_name']::string as model_name,
    count(*) as num_executions,
    sum(query_cost) as total_cost,
    avg(total_elapsed_time_s) as avg_total_elapsed_time_s
from query_history_enriched
where
    try_parse_json(query_tag)['app_name']::string = 'pipeline'
    and start_time > current_date - 7
group by 1

This has the advantage of being universally applicable to all data warehouses, and is easier to implement since it doesn't require running an alter session statement. Another desirable advantage is performance, since running an alter session statement involves a round trip network call to Snowflake². This is fine for most use cases, but may not be acceptable in applications where an additional 100-200ms in response time matters. Lastly, since the query text can be up to 1MB in size, query comments can contain much more metadata than query tags, which are limited to 2000 characters.

Where possible, we recommend using query tags since they are much simpler to parse and analyze downstream. If it's possible for your query metadata to exceed 2000 characters, stick with query comments.

Notes

¹Snowflake automatically removes any comments at the beginning of each query, so you must append them to the end of the query. ↩

²The alter session statement itself is extremely fast, taking about 30ms on average. ↩

Snowflake query tags for enhanced monitoring

What is a query tag in Snowflake?

Why should you use query tags in Snowflake?

How do you use query tags in Snowflake?

Setting default tags

Setting query tags at the session level

Setting query tags in Python

Setting once upon connection creation

Modifying manually through `alter session`

Setting query tags in dbt

Using JSON strings

How to use query tags for Snowflake cost & performance monitoring

Using the dbt-snowflake-monitoring package

Using query comments instead of query tags

Notes

Optimize your Snowflake usage

Snowflake optimization & cost management platform

What is a query tag in Snowflake?

Why should you use query tags in Snowflake?

How do you use query tags in Snowflake?

Setting default tags

Setting query tags at the session level

Setting query tags in Python

Setting once upon connection creation

Modifying manually through alter session

Setting query tags in dbt

Using JSON strings

How to use query tags for Snowflake cost & performance monitoring

Using the dbt-snowflake-monitoring package

Using query comments instead of query tags

Notes

Optimize your Snowflake usage

Snowflake optimization & cost management platform

Modifying manually through `alter session`