Snowflake query tags for enhanced monitoring
- Date
- Ian WhitestoneCo-founder & CEO of SELECT
Snowflake query tags allow users to associate arbitrary metadata with each query. In this post, we show how you can use query tags to achieve better visibility & monitoring for your Snowflake query costs and performance.
What is a query tag in Snowflake?
Query tags are an optional session level parameter that allow users to tag any Snowflake SQL statement with a string. They can be up to 2000 characters in length, and contain any characters. Query tag values for each query can be found in the output of the query_history
views in Snowflake, allowing them to be leveraged for a variety of use cases.
What about object tagging?
It's worth noting that query tags are different than object tags. Both accomplish the shared purpose of enabling more structured monitoring and improving visibility within your Snowflake account. However, object tags are used for persistent account objects like users, roles, tables, views, functions and more.
Why should you use query tags in Snowflake?
For most Snowflake users, compute costs from queries running in virtual warehouses will make up the vast majority of their Snowflake spend. While it is possible to attribute compute spend within a warehouse to different users by calculating cost per query, this is often not granular enough since a single production user account can generate the vast majority of queries, and costs.
Query tags enable more fine grained cost attribution. If you have a single SQL statement, or series of SQL statements associated with a data model in a pipeline, you can assign them the same query tag. Costs can then be easily attributed to all queries associated with the given tag. The alternative to this involves grouping by query_text
, which does not allow for multiple related SQL statements to be bucketed together. It also falls apart when the SQL text for a given data model inevitably gets changed.
Query tags can also be used for more granular query performance monitoring. Sticking with the example from earlier, users may wish to monitor the total runtime for each data model by grouping together the total elapsed time for all associated queries. Alternatively, if a set of queries are used to power user facing application dashboards, leveraging query tags can allow for more targeted performance monitoring.
Last of all, query tags provide the ability to link queries with metadata from other systems. A query tag could contain a dashboard_id
, which could enable users to aggregate all costs for a single dashboard, and then see how often that dashboard is used through the BI tool's metadata.
How do you use query tags in Snowflake?
Setting default tags
Query tags are a session level parameter, but defaults can be set at the account and user level. For example:
alter user shauna set query_tag = '{"team": "engineering", "user": "shauna"}';
Every query issued by this user will now have this default tag.
Setting query tags at the session level
Use the alter session
command to set the query tag. After this command is run, all subsequent queries run in the same session will be tagged with that string.
alter session set query_tag='users_model';
-- this query will be tagged with 'users_model'
create or replace table users_tmp as (
select *
from raw_users
where
not deleted
and created_at > current_date - 1
);
-- this will also be tagged with 'users_model'
insert into users
from users_tmp
;
alter session set query_tag='orders_model';
-- this will get tagged with 'orders_model'
create or replace table orders as (
select *
from raw_orders
where
not deleted
);
We recommend using user defaults wherever possible, to avoid frequent alter session
calls which add latency to the overall query execution.
Setting query tags in Python
If you are using Python to execute queries, there are two ways to set query tags.
Setting once upon connection creation
When you are creating your connection object with the Snowflake Python Connector, you can set any session parameters upfront. In the example below, all queries executed from this con
object will be tagged with DATA_MODELLING_PIPELINE
.
con = snowflake.connector.connect(
user='XXXX',
password='XXXX',
account='XXXX',
session_parameters={
'QUERY_TAG': 'DATA_MODELLING_PIPELINE',
}
)
Modifying manually through alter session
If you don't want all queries in your session to have the same tag, you can instead run alter session set query_tag = 'XXX'
prior to running your actual queries.
con.cursor().execute("alter session set query_tag='users_model'")
query = """
create or replace table users_tmp as (
select *
from raw_users
where
not deleted
and created_at > current_date - 1
)
"""
con.cursor().execute(query) # tagged with 'users_model'
con.cursor().execute("insert into users from users_tmp") # tagged with 'users_model'
con.cursor().execute("alter session set query_tag='orders_model'");
query = """
create or replace table orders as (
select *
from raw_orders
where
not deleted
)
"""
con.cursor().execute(query) # tagged with 'orders_model'
Setting query tags in dbt
If you are using dbt, there are three options for setting query tags:
- They can be set once in your
profiles.yml
(source). All queries run in your dbt project will then be tagged with that value. - Tags can be set for all models under a particular
resource_path
, or for a single model, by adding a+query_tag
in yourdbt_project.yml
. For individual models, you can also specify the query tag in the model config, i.e.{{ config(query_tag = 'XXX') }}
. If a default query tag has been set inprofiles.yml
, it will be overridden by any of these more precise tags. - You can create a
set_query_tag
macro which automatically sets the query tag to the model name for all models in your project.
Refer to the dbt documentation for examples of each of these options, and do make note of the potential failure mode they listed where queries can be set with an incorrect tag if specific failures occur upstream.
We have recently released a new dbt package dbt-snowflake-query-tags to tag all dbt-issued queries with a comprehensive set of metadata, check it out.
Using JSON strings
When setting query tags, we recommend using a JSON object for ease-of-use and consistency. Continuing with the data model tagging example from earlier, we could use a JSON object to additional information like the environment the model ran in, the version, the trigger for the data model to run (was it a scheduled run or manually invoked?), and more.
import json
query_tag = {
'app_name': 'pipeline',
'model_name': 'users',
'environment': 'prod',
'version': 'v1.2',
'trigger': 'schedule'
}
con.cursor().execute(f"alter session set query_tag='{json.dumps(query_tag)}'")
con.cursor().execute(model_sql)
How to use query tags for Snowflake cost & performance monitoring
Query tags are surfaced in the query history views for each query_id
. Here's an example query that shows average query performance by query_tag
:
select
query_tag,
count(*) as num_executions,
avg(total_elapsed_time/1000) as avg_total_elapsed_time_s
from snowflake.account_usage.query_history
where
start_time > current_date - 7
group by 1
If the query_tag
contains a JSON object, it can be parsed and segmented by any of the keys. Using the example from above:
select
try_parse_json(query_tag)['model_name']::string as model_name,
count(*) as num_executions,
avg(total_elapsed_time/1000) as avg_total_elapsed_time_s
from snowflake.account_usage.query_history
where
try_parse_json(query_tag)['app_name']::string = 'pipeline'
and start_time > current_date - 7
group by 1
Using the dbt-snowflake-monitoring package
If you are using SELECT's dbt package for cost & performance monitoring, you can analyze cost by query tag in addition to the performance:
select
try_parse_json(query_tag)['model_name']::string as model_name,
count(*) as num_executions,
sum(query_cost) as total_cost,
avg(total_elapsed_time_s) as avg_total_elapsed_time_s
from query_history_enriched
where
try_parse_json(query_tag)['app_name']::string = 'pipeline'
and start_time > current_date - 7
group by 1
You'll also find that these queries run much quicker than a query against Snowflake's account usage views, since the table is materialized and sorted by start_time
to achieve a well clustered state.
Using query comments instead of query tags
Another common practice for tagging queries is to add a comment to the bottom 1 of each query:
create or replace table orders as (
select *
from raw_orders
where
not deleted
);
-- '{"model_name": "orders", "environment": "prod", "version": "v1.2", "app_name": "pipeline", "trigger": "schedule"}'
This has the advantage of being universally applicable to all data warehouses, and is easier to implement since it doesn't require running an alter session
statement. Another desireable advantage is performance, since running an alter session
statement involves a round trip network call to Snowflake
2. This is fine for most use cases, but may not be acceptable in applications where an additional 100-200ms in response time matters. Lastly, since the query text can be up to 1MB in size, query comments can contain much more metadata than query tags, which are limited to 2000 characters.
Where possible, we recommend using query tags since they are much simpler to parse and analyze downstream. If it's possible for your query metadata to exceed 2000 characters, stick with query comments.
1 Snowflake automatically removes any comments at the beginning of each query, so you must append them to the end of the query.
2 The alter session
statement itself is extremely fast, taking about 30ms on average.