Integrations

Custom Workloads

What are custom workloads?

Many Snowflake users leverage the platform for unique workloads, such as custom ETL pipelines, machine learning applications, or analytics applications. It's important for users to monitor these workloads to understand their cost and performance over time.

Feature overview

The SELECT custom workload integration allows you better understand and optimize Snowflake spend associated with these unique workloads.

  • Easily dissect your workload spend across a variety of different dimensions such as environment, team and tenant_id.
  • Identify expensive workloads, and understand how their cost and performance varies over time.
  • Drill into the execution history of any workload and view query profile diagnostics for each run.
  • Receive tailored optimization recommendations to improve workload cost and performance
SELECT custom workloads integration dashboard overview

Integrating with custom workloads

To support custom workload monitoring (i.e. queries run by custom tooling), SELECT utilizes query metadata. The query metadata is a JSON object, and can be set in either the query's tag or in a comment. The schema is as follows:

{
# [REQUIRED]
"select_schema_version": "1.0.0",
# [REQUIRED]
# identifiers
"app": "generated_tables", # first grouping category
"workload_id": "my_etl_job", # secondary grouping category
# [OPTIONAL]
# If a workload contains multiple queries, include a run_id to group them together
# If run_id isn't present, each query is considered an individual run
# This value might be a date but can be anything
"run_id": "2020-01-01",
# [OPTIONAL]
# If specified, these will automatically appear in the UI as options you can filter & slice with
"environment": "dev",
"team": "data",
# [OPTIONAL]
# Custom tags - you can put whatever you want in here!
# These fields can be used for slicing and filtering in the SELECT UI
# after you've specified them in the UI under Settings -> Custom
"tenant_id": "customer 1",
"creator": "Hermione Granger",
"description": "Processes the data",
"link": "https://airflow.org.net/dag/my_etl_job",
"git_sha": "abc123",
# [OPTIONAL]
# You can add complex data types, but we currently don't support using these in the UI
"tags": ["critical", "sensitive"],
"meta": {"task_key": "A1SF249X", "attempt_number": 1}
}

Example query comments

Both multi-line and single-line comment syntax are supported. If using single-line comments, ensure that the final character in the comment is a closing curly brace and not a semi-colon.

Multi-line comment

select * from my_table
/*
{
"select_schema_version": "1.0.0",
"app": "ingest",
"workload_id": "facebook_ads"
}
*/

Single-line comment:

select * from my_table
-- {"select_schema_version": "1.0.0", "app": "ingest", "workload_id": "facebook_ads"}

Example query tag

Alternatively, query tags can be used. The downside to using query tags is the need to both set and unset the query tag in the session, adding latency and additional complexity.

{"select_schema_version": "1.0.0", "app": "ingest", "workload_id": "facebook_ads"}

Example Python implementation using query comments

import json
import snowflake.connector
# Connect to Snowflake & create a cursor
conn = snowflake.connector.connect(
user='<your_username>',
password='<your_password>',
account='<your_account>',
)
cursor = conn.cursor()
# Generate query comment
select_query_comment = {
"select_schema_version": "1.0.0",
"app": "data_transformations",
"workload_id": "customers",
"environment": "prod"
}
select_query_comment_str = f"\n\n/* {json.dumps(select_query_comment)} */"
## Execute query with comment
query = """
create or replace table db.analytics.customers as (
select *
from raw_customers
where
not deleted
and created_at > '2019-01-01'
)
"""
cursor.execute(query + select_query_comment_str)

Leveraging custom metadata fields

Custom workloads allow users to tag their queries with comments or Snowflake query tags. Here’s an example query comment SELECT has for one of its custom workloads:

select ...
from queries
group by all
order by sum_query_cost desc nulls last
limit 1000
/* {
"select_schema_version": "1.0.0",
"app": "SELECT_WEB_APP",
"workload_id": "DailyWorkloadPoPQuery",
"environment": "prod",
"tenant_id": "scwxhob-ad38017",
"referrer_base_url": "https://select.dev/app/scwxhob-ad38017/workloads",
"is_admin_user": true,
"user_id": "google-oauth2|115381118870474481840"
}*/

Custom workload fields like app, workload_id and environment are already available for filtering & slicing in the UI out of the box. To enable the same behaviour for other fields you specify in your custom workload query comment/tag, head to Settings → Custom and specify the keys you want enabled.

SELECT custom workload meta keys settings

Once added, you can filter and slice by this metadata throught the UI!

Leverage custom meta keys in your Usage Groups

Looking to do cost chargebacks or showbacks using the metadata you've added to your query? You can now flexibly allocate costs based on custom workload metadata using SELECT's Usage Groups feature!

We don't currently support keys with complex data types

We currently do not support filtering/slicing on custom meta keys which contain complex data types like arrays or objects. If this is a requirement for you, please let us know and we will add this functionality.

select * from credit_cards
/* {
"select_schema_version": "1.0.0",
"app": "generated_tables",
"workload_id": "my_etl_job",
"tags": ["critical", "sensitive"],
"meta": {"task_key": "A1SF249X", "attempt_number": 1} }
*/

Note, you can still add this data to your query comment/tag and perform you own analysis on it. You just want be able to interact with that data in the SELECT UI as of today.

Previous
dbt