Everything You Need to Know About Snowflake Dynamic Tables

Date
  • Jeff Skoldberg
    Principal Consultant at Green Mountain Data Solutions

What are Snowflake Dynamic Tables?

In Snowflake, a Dynamic Table is a table that is materializes the result of a SQL query and is automatically kept up to date on a schedule. Much like a view, the SQL to create the dynamic table can include significant complexity: join multiple tables, different join types (left join, full join, Cartesian), unions, calculations, etc. But unlike a view which never persists the data, the result of a dynamic table is persisted as a physical table.

Dynamic Tables are refreshed at a frequency you specify, called “target lag” or target_lag. Therefore, selecting (reading) from a dynamic table is often much more performant than reading a view with a lot of complex joins.

Dynamic tables are a simple and cost effective way to create performant data pipelines that will always stay up to date.

How are Dynamic Tables different than Materialized Views?

In Snowflake, a Materialized View is a view of a single table. It persists as a physical table so it is fast to query, but the kept up to date in real time, just like a view.

Below are the key differences between Materialized Views and Dynamic Tables.

Joins

In Snowflake, a Materialized View cannot have joins. This is a major limitation which severely limits the usefulness of Materialized Views.

Comparison to Other Database Views

A Dynamic Table in Snowflake is much more like a Materialized View in other databases, such as PostgreSQL, where there are very few limitations on the SQL you can write. However, in Postgres, a Materialized View must be manually refreshed with the refresh materialized view command, whereas a Dynamic Table in Snowflake can automatically be refreshed based on target lag.

Refresh Frequency

Materialized views in Snowflake have the advantage that they are always kept up to date. They are real-time, with nothing you need to do to refresh it.

Dynamic Tables, however, refresh on a schedule. The end user may need to be aware of how fresh the data is.

Query Rewrite

When querying the base tables of a materialized view, Snowflake’s query optimizer may re-write your query to use the materialized view instead.

When querying the base data of a dynamic table, snowflake does not re-write the query to use the dynamic table.

My thoughts on the differences

While materialized views have two advantages (always fresh and query rewrite), the fact they must be based on a single table limits the usefulness so much that reaching for a dynamic table will be much more common.

Use a materialized view when all you are doing is aggregating or adding calculations, or transforming a single table.

Use a Dynamic Table for more complex use cases.

How to create a Dynamic Table

A dynamic table is created similar to the familiar CTAS, but with few extra parameters, target_lag, warehouse, and many more optional parameters shown below.

Full syntax, copied from Snowflake docs:

CCREATE [ OR REPLACE ] [ TRANSIENT ] DYNAMIC TABLE [ IF NOT EXISTS ] <name> (
-- Column definition
<col_name> <col_type>
[ [ WITH ] MASKING POLICY <policy_name> [ USING ( <col_name> , <cond_col1> , ... ) ] ]
[ [ WITH ] TAG ( <tag_name> = '<tag_value>' [ , <tag_name> = '<tag_value>' , ... ] ) ]
[ COMMENT '<string_literal>' ]
-- Additional column definitions
[ , <col_name> <col_type> [ ... ] ]
)
TARGET_LAG = { '<num> { seconds | minutes | hours | days }' | DOWNSTREAM }
WAREHOUSE = <warehouse_name>
[ REFRESH_MODE = { AUTO | FULL | INCREMENTAL } ]
[ INITIALIZE = { ON_CREATE | ON_SCHEDULE } ]
[ CLUSTER BY ( <expr> [ , <expr> , ... ] ) ]
[ DATA_RETENTION_TIME_IN_DAYS = <integer> ]
[ MAX_DATA_EXTENSION_TIME_IN_DAYS = <integer> ]
[ COMMENT = '<string_literal>' ]
[ [ WITH ] ROW ACCESS POLICY <policy_name> ON ( <col_name> [ , <col_name> ... ] ) ]
[ [ WITH ] TAG ( <tag_name> = '<tag_value>' [ , <tag_name> = '<tag_value>' , ... ] ) ]
[ REQUIRE USER ]
AS <query>

Practically speaking, your create statement may look like this:

create or replace dynamic table my_table
target_lag = '1 Day'
warehouse = 'TRANSFORMING'
refresh_mode = 'incremental'
as
select
customers.name,
count(*) as total_orders
from orders
inner join customers
using (customer_id)
group by 1

The refresh is initialized immediately, because the default value for intitalize is on_create.

Let’s dig in to the two unique arguments you will use the most: target_lag and refresh_mode

Target Lag

Target lag is the maximum amount of time you want the data to lag behind changes in the source data. It is expressed as an integer and the datetime part (seconds, minutes, hours, days). For example, 5 minutes is a valid target lag.

If you stack a dynamic table on top of another dynamic table, you can set target_lag = downstream for all dynamic tables in the chain except for the last one, which should have a time schedule. Consider the example below:

SELECT dynamic tables in Snowflake

In the image above, we have two dynamic tables chained together. The first one (orders_joined) has a target lag of downstream. The second one (orders_aggregated ) has a target lag of 1 day. In this case, the entire DAG will update once daily. You don’t need to worry about setting a schedule on each table.

Important Considerations with Target Lag

  • If the last dynamic table in your DAG has a target lag of downstream, your data will never refresh!

  • Snowflake provides this callout in the docs: Target lag is not a guarantee. Instead, it is a target that Snowflake attempts to meet. Data in dynamic tables is refreshed as closely as possible within the target lag. However, target lag may be exceeded due to factors such as warehouse size, data size, query complexity, and similar factors.

Refresh Mode

The Refresh Mode can be auto, full, or incremental. The default refresh mode is auto which attempts to refresh incrementally and defaults to full if it cannot.

Dynamic Tables refresh incrementally based on changes in the source data. You don’t need to provide any info about a primary key or how to check for changes, Snowflake does it all for you, like magic! For more info on Changes, check out the Snowflake Docs and our blog post on Streams.

Streams vs Dynamic Tables

It is worth mentioning that Dynamic Tables are related to Streams in Snowflake, as they use the same change tracking technology under the hood, changes. In fact, these features are maintained by the same team at Snowflake!

Dynamic Tables are a declarative, user friendly way to build a pipeline.

Streams are imperative, and much more customization is involved to make the pipeline work.

Limitations of incremental refresh mode

A dynamic table cannot be refreshed incrementally if:

  • An unsupported SQL function such as current_timestamp or random is used.
  • An unsupported SQL construct is used: pivot, unpivot, union, minus, intersect, except.
    • union all is supported for incremental! Except in some edge cases.
  • Non identical partion_by clauses are used in various window functions.
  • More than 5% of the data is changed. This is a big one to watch out for!
  • Subquery operators such as in, any, all, exists are used.

There are several more limitations worth watching out for! Above, I only listed the limitations I think you will face most often. For a full explanation of Incremental Refresh limitations for Dynamic Tables, please read this page.

auto refresh may be a nice choice… but…

If you successfully create a Dynamic Table using incremental and one day Snowflake cannot refresh it incrementally due to one of the documented limitations, the refresh will fail. It will fail silently unless you’ve engineered some type of alerting mechanism to monitor the refreshes. To avoid this headache, you can use auto to allow Snowflake to full refresh your table when needed. Then, monitor your dynamic tables to observe how Snowflake is handling your refreshes.

However, Snowflake does give a warning about using auto refresh mode:

For consistent behavior, explicitly set the refresh mode on all production tables. The behavior of AUTO might change between Snowflake releases, which can cause unexpected changes in performance if used in production pipelines.

Monitoring Dynamic Tables

The Data tab in Snowsight

The easiest way to monitor Dynamic Tables is using the Snowsight UI.

In the Data tab on the left sidebar in Snowsight, navigate to your dynamic table. Click it, then click on the Refresh History tab.

Show all Secrets in Snowflake

A few super useful things to call out:

  • The Lag Metrics section is quite handy. Here you can observe if the lag has ever exceeded the target lag, how fresh the current data is, and the max lag that ever existed on the table.
  • In the refresh history data section, on the right hand side you will find a hyper link to the query profile for the refresh. (The lightning bolt inside a house icon). This is super useful when troubleshooting long refreshes!

For automated monitoring of refresh failures, see our articles on how to send alerts to Slack and Microsoft Teams.

The Monitoring tab in Snowsight

Not only can you monitor individual DAGs by selecting any dynamic table from the Data tab, you can also see the status of all dynamic tables in one place using the “Monitoring” tab.

SELECT dynamic tables in Snowflake

Here, you can sort on “Refresh Status” to find failed refreshes.

Altering / updating a dynamic table

Snowflake provides several properties that can be updated via alter dynamic table. Here is the exhaustive list from Snowflake’s documentation:

ALTER DYNAMIC TABLE [ IF EXISTS ] <name> { SUSPEND | RESUME }
ALTER DYNAMIC TABLE [ IF EXISTS ] <name> RENAME TO <new_name>
ALTER DYNAMIC TABLE [ IF EXISTS ] <name> SWAP WITH <target_dynamic_table_name>
ALTER DYNAMIC TABLE [ IF EXISTS ] <name> REFRESH [ COPY SESSION ]
ALTER DYNAMIC TABLE [ IF EXISTS ] <name> { clusteringAction }
ALTER DYNAMIC TABLE [ IF EXISTS ] <name> { tableColumnCommentAction }
ALTER DYNAMIC TABLE <name> { SET | UNSET } COMMENT = '<string_literal>'
ALTER DYNAMIC TABLE [ IF EXISTS ] <name> dataGovnPolicyTagAction
ALTER DYNAMIC TABLE [ IF EXISTS ] <name> searchOptimizationAction
ALTER DYNAMIC TABLE [ IF EXISTS ] <name> SET
[ TARGET_LAG = { '<num> { seconds | minutes | hours | days }' | DOWNSTREAM } ]
[ WAREHOUSE = <warehouse_name> ]
[ DATA_RETENTION_TIME_IN_DAYS = <integer> ]
[ MAX_DATA_EXTENSION_TIME_IN_DAYS = <integer> ]
[ DEFAULT_DDL_COLLATION = '<collation_specification>' ]
ALTER DYNAMIC TABLE [ IF EXISTS ] <name> UNSET
[ DATA_RETENTION_TIME_IN_DAYS ],
[ MAX_DATA_EXTENSION_TIME_IN_DAYS ],
[ DEFAULT_DDL_COLLATION ]

Examples: How to Alter a Dynamic Table

-- Suspend the dynamic table / turn of automatic updates
alter dynamic table my_dynamic_table suspend
-- turn auto updates back on
alter dynamic table my_dynamic_table resume
-- update target lag
alter dynamic table my_dynamic_table set target_lag = '2 Days'
-- change target_lag from time interval to downstream
alter dynamic table my_dynamic_table set target_lag = 'downstream'
-- change refresh_mode from incremental to full
alter dynamic table my_dynamic_table set refresh_mode = 'full'
-- manually trigger the refresh of a dynamic table right now
alter dynamic table my_dynamic_table refresh

Editing the SQL of a Dynamic Table

It is notable that you cannot add a column or change the SQL via alter command; you must replace the table completely. This can be done by drop table my_dynamic_table then recreating it. Or, you can use create or replace in the DDL: create or replace dynamic table my_dynamic_table...

End to End Data Pipeline Example

Let’s create a simple example that everyone can follow along. For this example, we will be copying the sample data from the snowflake_sample_data database. (Unfortunately you cannot clone or create dynamic tables on top of data from a share, so we will just duplicate the data, then create the dynamic tables).

Here is a picture of the DAG we are creating:

SELECT dynamic tables in Snowflake

Setup

Let’s create some new objects so we are all working from the same starting point.

use role sysadmin;
create warehouse example_wh_xs
warehouse_size = xsmall
auto_suspend = 60
auto_resume = true;
use role securityadmin;
create role example_role;
grant role example_role to user jeff; -- swap in your user
grant all on warehouse example_wh_xs to role example_role;
use role sysadmin;
create database dynamic_demo;
grant ownership on database dynamic_demo to role example_role;
use role example_role;
create schema dynamic_demo.sample_data;

Now let’s copy some sample data into our new database. (Again, clone will not work!)

use schema dynamic_demo.sample_data;
create or replace table orders as
select * from
snowflake_sample_data.tpch_sf10.orders;
create or replace table customer as
select * from
snowflake_sample_data.tpch_sf10.customer;

Now let’s create two dynamic tables: a table to join our two new tables, and a table which aggregates the data. The first dynamic table in the DAG will have a target lag of downstream. The second dynamic table will have a target lag of 1 day. This will control the lag for the entire DAG.

create or replace dynamic table obt_orders
target_lag = 'downstream'
warehouse = 'EXAMPLE_WH_XS'
refresh_mode = 'incremental'
as
select *
from orders join customer
on orders.o_custkey = customer.c_custkey;
create or replace dynamic table current_month_fulfillments
target_lag = '1 day'
warehouse = 'EXAMPLE_WH_XS'
refresh_mode = 'incremental'
as
with current_month as (
select date_trunc('month',max(o_orderdate)) as current_month_start from obt_orders
)
select
c_mktsegment as market_segment,
sum(o_totalprice) as order_amount
from obt_orders, current_month
where
o_orderstatus in ('F','P')
and date_trunc('month',o_orderdate) = current_month_start
group by 1
;

Change the source data, run the pipeline

The current_month_fulfillments table should have no data, because none of the orders have F or P status for that month. (Check the where clause on our query above).

SELECT dynamic tables in Snowflake

Now let’s update the raw data and manually refresh the current_month_fulfillments table:

update orders set o_orderstatus = 'F'
where date_trunc('month', o_orderdate) = '1998-08-01';
-- yes, the max order month in the data is august 1998!
-- 12,466 rows updated
-- manually refresh the last table in the DAG so we don't have to wait:
alter dynamic table current_month_fulfillments refresh;

We can see the first table refreshed automatically, because it was set to target_lag='downstream':

SELECT dynamic tables in Snowflake

We see that 12.5K rows were deleted and re-inserted.

When we check the second table in the DAG, we can see that 5 rows were inserted:

SELECT dynamic tables in Snowflake
SELECT dynamic tables in Snowflake

Click on the “Graph” tab, which shows a great view of all of the tables in the DAG. The dynamic tables will show a status: “Succeeded” in green or “Failed” in red.

SELECT dynamic tables in Snowflake

Clean up

drop database dynamic_demo;
use role sysadmin;
drop warehouse example_wh_xs;
use role securityadmin;
drop role example_role;

Dynamic Tables with dbt

Creating a dynamic table in dbt is as easy as adding this config to your normal .sql file:

{{ config(
materialized="dynamic_table",
on_configuration_change="apply" | "continue" | "fail",
target_lag="downstream" | "<integer> seconds | minutes | hours | days",
snowflake_warehouse="<warehouse-name>",
refresh_mode="AUTO" | "FULL" | "INCREMENTAL",
initialize="ON_CREATE" | "ON_SCHEDULE",
) }}

Or in a property file:

version: 2
models:
- name: [<model-name>]
config:
materialized: dynamic_table
on_configuration_change: apply | continue | fail
target_lag: downstream | <time-delta>
snowflake_warehouse: <warehouse-name>
refresh_mode: AUTO | FULL | INCREMENTAL
initialize: ON_CREATE | ON_SCHEDULE

How it works with dbt

When dbt runs the first time, it will create the dynamic table. On subsequent runs, it will see the table already exists and skip it. The table is only refreshed by the target_lag, not by running dbt.

As mentioned earlier, Snowflake does not support changing the SQL for a dynamic table. Therefore, any change to your model definition requires a --full-refresh.

Running dbt with --full-refresh will drop and re-create the dynamic table.

Comparing Dynamic Tables in dbt vs dbt Incremental

Dynamic Tables:

  • Refresh mechanism is managed by Snowflake, not dbt.
  • Declarative: you only define the select statement, not the incremental logic.

Incremental Models:

  • Refresh is managed by dbt, or whatever is orchestrating dbt.
  • Imperative: you must define custom incremental logic.
  • Use when you need finer control over how a table is incrementally refreshed.

Limitations of Dynamic Tables

Dynamic tables have quite a few limitations. For a full list of limitations, I suggest referring to the docs. But here are the limitations I think you are most likely to face:

  • Dynamic tables cannot be downstream from materialized views, external tables or streams.
  • You can’t create a temporary dynamic table.
  • The Snowflake documentation mentions you can’t truncate a dynamic table. But they miss mentioning that you cannot do any DML operation. Insert, Update, Delete will all fail on a dynamic table. This makes sense, as a dynamic table must track the underlying sources and sql definition.
  • You can’t set the DATA_RETENTION_TIME_IN_DAYS parameter in your source tables to zero. This is because changes in Snowflake leverages time travel. Time travel must be enabled.
  • The target lag must be less than the data_retention_time_in_days of the upstream tables.
  • You can’t use dynamic SQL (session variables) in dynamic tables.
  • Operations on dynamic tables are not captured by Snowflake’s access_history view.
  • You can’t use sequences. For example, the SQL definition of the dynamic table cannot contain: select my_sequence.nextval
  • You can’t use sample or tablesample in the dynamic table definition.
  • Cloned incremental dynamic tables may full-refresh when initialized.

Best practices for Dynamic tables

Here are my top recommendations to consider when using Dynamic Tables:

  • Have the longest target lag possible for your use case. This will help you reduce compute costs by minimizing the amount of times your tables will re-compute (refresh).
  • Chaining dynamic tables is encouraged. This allows you to create pipelines that are made up entirely of dynamic tables and views, allowing Snowflake to manage the refresh.
  • Set downstream target lag on all tables except for the last table in the DAG.
    • If you have several leaf nodes, you can use a controller table to maintain target lag time (and other properties) in a single table for your account. Example here.
  • Use transient dynamic tables to reduce storage cost.
    • Use higher time travel on the sources.
  • Additional best practices from the Snowflake team can be found here.

Dynamic Tables Pricing

With dynamic tables, you are charged for three main things:

  1. The compute costs associated with refreshing the tables
  2. The storage costs of the dynamic tables themselves
  3. Cloud services costs associated with refreshes, only if they exceed 10% of your daily compute costs

How to Monitor Dynamic Table Costs?

Snowflake recommends to use a dedicated warehouse to monitor cost for dynamic tables; however, I would recommend not proliferating warehouses.

Instead, you can use a tool like SELECT that will automatically show you the cost of each dynamic table and how that is changing over time.

SELECT dynamic tables in Snowflake

You can also see the cost of your entire dynamic table DAG, which can often let you spot bigger issues like things running to frequently.

SELECT dynamic tables in Snowflake

Wrap Up

Dynamic tables are a great tool to have in your data pipeline toolbelt. The automated refreshes and simplicity of not needing to specify incremental logic make dynamic tables a very attractive option. Just be aware of the many limitations of dynamic tables, especially the limitations we discussed around incremental refresh, which are significant.

Hopefully you now feel ready to use dynamic tables in your data pipelines! I’d love to hear about your experiences with dynamic tables.

Jeff Skoldberg
Principal Consultant at Green Mountain Data Solutions
Jeff Skoldberg is a Data and Analytics Consultant with 15+ years experience in automating insights and using data to control business processes. From a technology standpoint, he specializes in Snowflake + dbt + Tableau. From a business topic standpoint, he has experience in Public Utility, Clinical Trials, Publishing, CPG, and Manufacturing. Jeff adds unique value for his supply chain clients, as he is well versed in all topics related to planning, forecasting, inventory and supply chain KPIs. Reach out any time! [email protected]

Get up and running with SELECT in 15 minutes.

Snowflake optimization & cost management platform

Gain visibility into Snowflake usage, optimize performance and automate savings with the click of a button.

SELECT web application screenshot

Want to hear about our latest Snowflake learnings? 
Subscribe to get notified.