Identifying unused tables in Snowflake

Ian Whitestone.Monday, March 20, 2023

While Snowflake storage costs tend to be a small portion of overall Snowflake spend, many customers have a significant number of unused tables in their accounts incurring unnecessary charges. If a dataset isn’t being used, adding value to the business or required by law to be stored, it should be removed.

Removing unused datasets can be a quick win for teams looking to reduce their Snowflake spend. It can also improve security and reduce risks associated with data breaches and data exposure. The less data you store, the smaller the footprint for unintended access.

Lastly, deleting unused tables can improve overall data warehouse usability. Unused datasets often contain data that is stale or not meant to be accessed, so removing these tables can help avoid confusion or reporting errors.

In this post, we’ll cover how to identify unused tables in Snowflake using the access_history account usage view.

If you want to skip ahead and see the final SQL implementation, you can head straight to the end!

Snowflake Access History View

Access History is a view in the Account Usage schema of the Snowflake Database. It is available for all Snowflake accounts on Enterprise Edition or higher. Access History can be used to look up the Snowflake objects (i.e. tables, views, and columns) accessed by each query, either directly or indirectly.

Direct versus Base Objects Accessed

To determine which columns were accessed by a query, there are two columns of interest: direct_objects_accessed and base_objects_accessed. A key difference between the two columns comes from how they handle views. Consider the following view definition:

1create or replace view orders_view as (
2	select *
3	from orders
4	where
5		not test
6		and success
7);

The query select * from orders_view directly accesses the orders_view object, and indirectly accesses the base orders table. Correspondingly, orders_view will appear in the direct_objects_accessed column of access_history, whereas orders will appear in base_objects_accessed.

When it comes to deciding if a table is unused, it’s important to use base_objects_accessed since this will account for queries that indirectly access a table through a view.

Parsing base_objects_accessed

base_objects_accessed is a JSON array of all base data objects accessed during query execution. Here’s an example of the column’s contents from the documentation:

1[
2  {
3    "columns": [
4      {
5        "columnId": 68610,
6        "columnName": "CONTENT"
7      }
8    ],
9    "objectDomain": "Table",
10    "objectId": 66564,
11    "objectName": "GOVERNANCE.TABLES.T1"
12  }
13]

The array of objects accessed by each query can be transformed to one row per object via lateral flatten and then filtered to only consider table objects, as shown below:

1with
2access_history as (
3    select *
4    from snowflake.account_usage.access_history
5),
6access_history_flattened as (
7    select
8        access_history.query_id,
9        access_history.query_start_time,
10        access_history.user_name,
11        objects_accessed.value:objectId::integer as table_id,
12        objects_accessed.value:objectName::text as object_name,
13        objects_accessed.value:objectDomain::text as object_domain,
14        objects_accessed.value:columns as columns_array
15

Find when a table was last queried/accessed

Using the “flattened” access_history from the query above, we can determine the exact time a table was last accessed along with the user who ran the query:

1
2with
3access_history as (
4    select *
5    from snowflake.account_usage.access_history
6),
7access_history_flattened as (
8    select
9        access_history.query_id,
10        access_history.query_start_time,
11        access_history.user_name,
12        objects_accessed.value:objectId::integer as table_id,
13        objects_accessed.value:objectName::text as object_name,
14        objects_accessed.value:objectDomain::text as object_domain,
15        objects_accessed.value:columns as columns_array

Calculate table storage costs

When identifying unused tables to delete, it’s helpful to see the associated storage costs. Using the table_storage_metrics account usage view and an assumed storage rate of $23 per terabyte per month, the annual storage cost of each table can be calculated:

1select
2    id as table_id,
3    table_catalog || '.' ||table_schema ||'.' || table_name as fully_qualified_table_name,
4    (active_bytes + time_travel_bytes + failsafe_bytes + retained_for_clone_bytes)/power(1024,4) as total_storage_tb,
5    -- Assumes a storage rate of $23/TB/month
6    -- Update to the appropriate value based on your Snowflake contract
7    total_storage_tb*12*23 as annualized_storage_cost
8from snowflake.account_usage.table_storage_metrics
9where
10    not deleted

Identify all tables not queried in the last X days

So far we’ve covered how to determine when a table was last accessed and the storage costs associated with each table. We can tie these building blocks together to identify all tables not queried in the last 90 days and show the annual savings that could be expected if the tables were deleted.

The SQL below relies on the account_usage.access_history view which is only available for Snowflake customers on Enterprise Edition and higher.

If you are using dbt, consider the alternative version of this SQL which runs must faster.

1with
2access_history as (
3    select *
4    from snowflake.account_usage.access_history
5),
6access_history_flattened as (
7    select
8        access_history.query_id,
9        access_history.query_start_time,
10        access_history.user_name,
11        objects_accessed.value:objectId::integer as table_id,
12        objects_accessed.value:objectName::text as object_name,
13        objects_accessed.value:objectDomain::text as object_domain,
14        objects_accessed.value:columns as columns_array
15

Identifying unused tables with dbt

Querying and flattening the access_history view can be very slow due to the volume of data that must be processed. For faster queries about table access history, we recommend incrementally materializing this data using our open-source dbt package: dbt_snowflake_monitoring. Once the package is installed, queries to identify unused tables become much simpler. The code from above can be re-written as:

1with
2table_access_summary as (
3    select
4        table_id,
5        max(query_start_time) as last_accessed_at,
6        max_by(user_name, query_start_time) as last_accessed_by,
7        max_by(query_id, query_start_time) as last_query_id
8    from query_base_table_access
9    group by 1
10),
11table_storage_metrics as (
12	select
13      id as table_id,
14      table_catalog || '.' ||table_schema ||'.' || table_name as fully_qualified_table_name,
15      (active_bytes + time_travel_bytes + failsafe_bytes + retained_for_clone_bytes)/power(1024,4) as total_storage_tb,

Find when a table was last updated

As part of deciding whether to delete a table, it can be helpful to know when the table was last updated by a DDL or DML operation. The query below shows how to find all tables that were updated in the past week by using the tables account usage view:

1select
2    table_id,
3    table_catalog||'.'||table_schema||'.'||table_name as fully_qualified_table_name,
4    last_altered as last_altered_at
5from snowflake.account_usage.tables
6where
7    last_altered > current_date - 7

Wrapping Up

Removing unused tables represents one of the many cost saving opportunities available to Snowflake users. In addition to surfacing table access patterns, SELECT automatically produces a variety of other optimization recommendations. Get access today or book a demo using the links below.

Ian Whitestone· Co-founder & CEO of SELECT

Ian is the Co-founder & CEO of SELECT, a SaaS Snowflake cost management and optimization platform. Prior to starting SELECT, Ian spent 6 years leading full stack data science & engineering teams at Shopify and Capital One. At Shopify, Ian led the efforts to optimize their data warehouse and increase cost observability.

Want to hear about our latest Snowflake learnings?Subscribe to get notified.

Get up and running in less than 15 minutes

Connect your Snowflake account and instantly understand your savings potential.

Book a demo Start free trial