Using the Snowflake Query History: 9 Practical Examples

Date: Sunday, April 14, 2024

Ian Whitestone
Co-founder & CEO of SELECT

One of my favorite aspects of Snowflake is all the metadata they make programmatically accessible to all users, at no extra cost! The Snowflake database is packed with hundreds of different views users can access via SQL to answer practically any question about their Snowflake usage.

The Query History dataset is one of the most widely used from the Snowflake database, as it contains a bunch of useful metadata about every query run in your account. The Query History can power a variety of different use cases like analyzing historical query runtimes, finding expensive queries or analyzing query failure rates.

In this post, I’ll share 9 real-world examples you can run in your account, today.

Accessing the Query History

Before jumping into the examples of how to use the query history, let’s quickly cover the 3 different ways you can access it.

Through the Snowsight UI

The easiest way to get started with using the Query History dataset is through the Snowsight UI, under the Monitoring tab.

From the UI, you can apply several different filters, and view additional columns.

It’s important to note that the UI only contains queries from the last 14 days:

The UI is most helpful for quickly checking queries that recently ran or are currently running. Once your use case requires more flexible filtering or additional data, users will typically move to one of the next two methods.

Information Schema Table Functions (information_schema.query_history)

The same data shown in the UI is available via the information_schema.query_history() table function, but you can only access data from the last 7 days. On the plus side, you can retrieve data for queries that are currently running!

Here’s an example to retrieve the last 100 queries run in your account:

1select *
2from table(snowflake.information_schema.query_history())
3order by start_time
4;

If you want to return more, you can return up to 10000 with the RESULT_LIMIT argument (the default is 100):

1select *
2from table(snowflake.information_schema.query_history(result_limit=>10000))
3order by start_time
4;

You can specify a custom time range using the end_time_range_start and end_time_range_end parameters.

1select
2    *
3from table(snowflake.information_schema.query_history(
4    end_time_range_start=>dateadd('hours',-1,current_timestamp()),
5    end_time_range_end=>current_timestamp(),
6    result_limit=>10000
7))
8order by start_time
9;

You can also restrict which columns are returned, or add filters. It’s important to note that any filters you add will be applied after the fact, meaning Snowflake will first return 10000 records, then filter them.

1select
2    query_id,
3    query_text,
4    user_name,
5    execution_status,
6    start_time,
7    end_time
8from table(snowflake.information_schema.query_history(result_limit=>10000))
9where 
10        user_name='IAN'
11order by start_time

If possible, you should rely on the other available table functions when performing filtering:

QUERY_HISTORY_BY_SESSION
QUERY_HISTORY_BY_USER
QUERY_HISTORY_BY_WAREHOUSE

The example above could be re-written as:

1select
2    query_id,
3    query_text,
4    user_name,
5    execution_status,
6    start_time,
7    end_time
8from table(snowflake.information_schema.query_history_by_user(user_name=>'IAN', result_limit=>10000))
9order by start_time

Latency & Data Retention

The main advantage of using the information_schema.query_history() table function is that it has no latency. If a query completed one second ago, it will show up in the output of this function. The same goes for a query that is still running.

The main downsides of using this dataset are that:

it only contains queries from the last 14 days
You can only analyze the results of 10K queries
Table functions are less intuitive to work with, compared to querying a view/table

If you need to analyze older queries or analyze trends over a longer time period, you’ll have to rely on the account_usage view, discussed below.

Account Usage View (account_usage.query_history)

The snowflake.account_usage.query_history view is my preferred way of analyzing & accessing the Query History dataset. You don’t need to remember how to work with table functions, and you can flexibly analyze data from the last year.

Here’s an example query where I count the number of queries from a user in the last 30 days:

1select count(*)
2from snowflake.account_usage.query_history
3where
4        start_time > current_date - 30
5        and user_name='IAN'
6order by start_time desc

For a full list of columns available, consult the Snowflake documentation.

Latency & Data Retention

The snowflake.account_usage.query_history view can have a latency up to 45 minutes. With that said, you’ll often see data much sooner in practice.

In terms of data retention, this view contains data from the last 365 days.

Are all queries included in the Query History?

Generally speaking, all queries you run in Snowflake will show up in the Query History. This includes programatically executed queries, queries run through the UI, queries run by tasks or stored procedures, or queries executed by Streamlit dashboards. The only examples of queries excluded from the Query History is short-running queries that operate exclusively against hybrid tables. If you are trying to analyze these queries, you’ll need to rely on the aggregate query history view.

Query History Data Retention

The Query History data retention periods depends on which method you are using to access the dataset:

Snowsight UI: 14 days
Information Schema Query History Table Function: 14 days
Account Usage Query History View: 365 days

Now that we’ve covered the basics for Query History and how to access it, let’s dive into some real examples! Everything covered below will rely on the account usage view discussed above.

1. Identify longest-running queries

To identify the longest-running queries in the last 30 days, you can sort by the total_elapsed_time column:

1select
2        query_id,
3        total_elapsed_time/1000 as total_elapsed_time_s, -- convert to seconds
4        user_name,
5        query_text
6from snowflake.account_usage.query_history
7where
8        start_time > current_date - 30
9order by total_elapsed_time desc

2. Analyze query performance over time

Here’s an example to analyze the average and 90th percentile query performance each hour in a given warehouse.

1select
2    date_trunc('hour', start_time) as start_time,
3    avg(total_elapsed_time)/1000 as avg_total_elapsed_time_s,
4    approx_percentile(total_elapsed_time, 0.90)/1000 as p90_total_elapsed_time_s,
5from snowflake.account_usage.query_history
6where
7    warehouse_name='SELECT_BACKEND'
8    and start_time between '2024-03-15' and '2024-03-31'
9group by 1
10;

You can even get a nice visualization directly in Snowsight!

3. Identify recurring query patterns with query_parameterized_hash

Snowflake recently (in 2023) introduced a new column called query_parameterized_hash. This value is created by removing literals from the query and then hashing the query text. The motivation of this is to help you identify recurring queries that may only differ slightly due to the parameter values. You can learn more about the query_parameterized_hashhere, along with its limitations (it only works with certain comparison operations).

Here’s an example to identify the top 100 query patterns based on the total elapsed time. The example also pulls the latest query text, user name and warehouse associated with the query parameterized hash.

1select
2    query_parameterized_hash,
3    count(*) as num_executions,
4    avg(total_elapsed_time)/1000 as avg_total_elapsed_time_s,
5    sum(total_elapsed_time) as total_elapsed_time_s,
6    max_by(query_text, start_time) as latest_query_text,
7    max_by(user_name, start_time) as latest_user_name,
8    max_by(warehouse_name, start_time) as latest_warehouse_name
9from snowflake.account_usage.query_history
10where
11    start_time > current_date - 7
12group by 1
13order by total_elapsed_time_s desc
14limit 100
15;

4. Top queries by query type

Snowflake includes a query_type column to help you differentiate between different types of queries run in your account. Examples include SELECT, CREATE_TABLE_AS_SELECT, INSERT, DELETE and [MERGE](https://select.dev/posts/snowflake-merges).

Here’s a query to identify the most common query types in your account over the last 7 days:

1select
2        query_type,
3        count(*) as cnt
4from snowflake.account_usage.query_history
5where
6    start_time > current_date - 7
7group by 1
8order by cnt desc

5. Queries run by a user in the last 3 months

To retrieve all the queries run by a user in the last 3 months, you can run the following:

1select *
2from snowflake.account_usage.query_history
3where
4    user_name='IAN'
5    and start_time > current_date - 90

6. Metadata only queries

Certain queries don’t require a running virtual warehouse. They can instead be served from the Cloud Services layer using metadata only. For example, if you run: select count(*) from my_table then this query can be answered entirely from metadata and does not need to run on a warehouse. Similarly, a query can avoid running on a warehouse if it has previously been run in the last 24 hours. Instead, Snowflake will return it instantly from the global result cache.

To identify these queries, filter to queries that do not have a listed warehouse size:

1select *
2from snowflake.account_usage.query_history
3where
4    warehouse_size is null
5    and start_time > current_date - 90

6. Queries with high Cloud Services usage

The metadata-only queries mentioned above are generally free because they don’t require a running virtual warehouse and Snowflake only charges for Cloud Services if it exceeds 10% of your daily compute usage.

If your Cloud Services usage exceeds 10%, you can identify queries with high cloud services usage using the following query:

1select
2        query_text,
3        partitions_scanned,
4        partitions_total,
5        partitions_scanned/partitions_total as fraction_scanned,
6        bytes_scanned/power(1024,3) as bytes_scanned_gb, -- convert to gigabytes
7from snowflake.account_usage.query_history
8where
9    start_time > current_date - 7
10    and bytes_scanned/power(1024,3) > 1
11    and fraction_scanned > 0.8
12limit 100

7. Queries with poor pruning

If the table you are querying is not well clustered or you don’t include a filter to limit the data being accessed, your query will end up scanning a lot of micro-partitions. Because scanning data is expensive, this can result in longer running and more expensive queries.

Here’s an example query to identify queries with poor pruning. For this example, we’ll define poor pruning as queries which scan more than 1GB of data and 80% of the micro-partitions.

1select
2        query_text,
3        partitions_scanned,
4        partitions_total,
5        bytes_scanned/power(1024,3) as bytes_scanned_gb, -- convert to gigabytes
6  bytes_spilled_to_local_storage/power(1024,3) as bytes_spilled_to_local_storage_gb,
7  bytes_spilled_to_remote_storage/power(1024,3) as bytes_spilled_to_remote_storage_gb,
8  bytes_spilled_to_local_storage_gb + bytes_spilled_to_remote_storage_gb as total_spillage_gb
9from snowflake.account_usage.query_history
10where
11    start_time > current_date - 7
12    and total_spillage_gb > 0
13order by total_spillage_gb desc
14limit 100

8. Queries spilling to disk & remote storage

When your the virtual warehouse your query is running on runs out of memory, the query will begin spilling to the local disk. Once there is no space left on the local disk, it will start spilling to remote storage. This is both slow and expensive. If it’s not possible to re-write your query to remove the disk spillage (i.e. by processing less data), then it will likely be cheaper to run the query on a larger virtual warehouse.

Here’s how you can identify queries spilling to disk and remote storage:

1select
2        query_text,
3        partitions_scanned,
4        partitions_total,
5        bytes_scanned/power(1024,3) as bytes_scanned_gb, -- convert to gigabytes
6  bytes_spilled_to_local_storage/power(1024,3) as bytes_spilled_to_local_storage_gb,
7  bytes_spilled_to_remote_storage/power(1024,3) as bytes_spilled_to_remote_storage_gb,
8  bytes_spilled_to_local_storage_gb + bytes_spilled_to_remote_storage_gb as total_spillage_gb
9from snowflake.account_usage.query_history
10where
11    start_time > current_date - 7
12    and total_spillage_gb > 0
13order by total_spillage_gb desc
14limit 100

9. Queries with out-of-memory errors and retries

As discussed above, Snowflake will try and gracefully handle out-of-memory errors by having the query spill to disk. However, sometimes there can be too many queries running concurrently which causes the node to crash. When this happens, Snowflake will automatically retry the query for you.

Snowflake recently added new query retry columns to help you identify when this is happening.

1select
2        query_text,
3        query_retry_time,
4  query_retry_cause,
5  fault_handling_time,
6        partitions_scanned,
7        partitions_total,
8        bytes_scanned/power(1024,3) as bytes_scanned_gb, -- convert to gigabytes
9  bytes_spilled_to_local_storage/power(1024,3) as bytes_spilled_to_local_storage_gb,
10  bytes_spilled_to_remote_storage/power(1024,3) as bytes_spilled_to_remote_storage_gb,
11  bytes_spilled_to_local_storage_gb + bytes_spilled_to_remote_storage_gb as total_spillage_gb
12from snowflake.account_usage.query_history
13where
14    start_time > current_date - 30
15    and query_retry_time > 0

The query_retry_cause even includes some tips on how to resolve the issue!

Using the Snowflake Query History: 9 Practical Examples

Accessing the Query History

Through the Snowsight UI

Information Schema Table Functions (information_schema.query_history)

Latency & Data Retention

Account Usage View (account_usage.query_history)

Latency & Data Retention

Are all queries included in the Query History?

Query History Data Retention

1. Identify longest-running queries

2. Analyze query performance over time

3. Identify recurring query patterns with query_parameterized_hash

4. Top queries by query type

5. Queries run by a user in the last 3 months

6. Metadata only queries

6. Queries with high Cloud Services usage

7. Queries with poor pruning

8. Queries spilling to disk & remote storage

9. Queries with out-of-memory errors and retries

Optimize your Snowflake usage

Snowflake optimization & cost management platform