Everything You Need to Know About Snowflake Dynamic Tables
- Date
- Jeff SkoldbergPrincipal Consultant at Green Mountain Data Solutions
What are Snowflake Dynamic Tables?
In Snowflake, a Dynamic Table is a table that is materializes the result of a SQL query and is automatically kept up to date on a schedule. Much like a view, the SQL to create the dynamic table can include significant complexity: join multiple tables, different join types (left join, full join, Cartesian), unions, calculations, etc. But unlike a view which never persists the data, the result of a dynamic table is persisted as a physical table.
Dynamic Tables are refreshed at a frequency you specify, called “target lag” or target_lag
. Therefore, selecting (reading) from a dynamic table is often much more performant than reading a view with a lot of complex joins.
Dynamic tables are a simple and cost effective way to create performant data pipelines that will always stay up to date.
How are Dynamic Tables different than Materialized Views?
In Snowflake, a Materialized View is a view of a single table. It persists as a physical table so it is fast to query, but the kept up to date in real time, just like a view.
Below are the key differences between Materialized Views and Dynamic Tables.
Joins
In Snowflake, a Materialized View cannot have joins. This is a major limitation which severely limits the usefulness of Materialized Views.
Comparison to Other Database Views
A Dynamic Table in Snowflake is much more like a Materialized View in other databases, such as PostgreSQL, where there are very few limitations on the SQL you can write. However, in Postgres, a Materialized View must be manually refreshed with the refresh materialized view
command, whereas a Dynamic Table in Snowflake can automatically be refreshed based on target lag.
Refresh Frequency
Materialized views in Snowflake have the advantage that they are always kept up to date. They are real-time, with nothing you need to do to refresh it.
Dynamic Tables, however, refresh on a schedule. The end user may need to be aware of how fresh the data is.
Query Rewrite
When querying the base tables of a materialized view, Snowflake’s query optimizer may re-write your query to use the materialized view instead.
When querying the base data of a dynamic table, snowflake does not re-write the query to use the dynamic table.
My thoughts on the differences
While materialized views have two advantages (always fresh and query rewrite), the fact they must be based on a single table limits the usefulness so much that reaching for a dynamic table will be much more common.
Use a materialized view when all you are doing is aggregating or adding calculations, or transforming a single table.
Use a Dynamic Table for more complex use cases.
How to create a Dynamic Table
A dynamic table is created similar to the familiar CTAS, but with few extra parameters, target_lag
, warehouse
, and many more optional parameters shown below.
Full syntax, copied from Snowflake docs:
CCREATE [ OR REPLACE ] [ TRANSIENT ] DYNAMIC TABLE [ IF NOT EXISTS ] <name> (
-- Column definition
<col_name> <col_type>
[ [ WITH ] MASKING POLICY <policy_name> [ USING ( <col_name> , <cond_col1> , ... ) ] ]
[ [ WITH ] TAG ( <tag_name> = '<tag_value>' [ , <tag_name> = '<tag_value>' , ... ] ) ]
[ COMMENT '<string_literal>' ]
-- Additional column definitions
[ , <col_name> <col_type> [ ... ] ]
)
TARGET_LAG = { '<num> { seconds | minutes | hours | days }' | DOWNSTREAM }
WAREHOUSE = <warehouse_name>
[ REFRESH_MODE = { AUTO | FULL | INCREMENTAL } ]
[ INITIALIZE = { ON_CREATE | ON_SCHEDULE } ]
[ CLUSTER BY ( <expr> [ , <expr> , ... ] ) ]
[ DATA_RETENTION_TIME_IN_DAYS = <integer> ]
[ MAX_DATA_EXTENSION_TIME_IN_DAYS = <integer> ]
[ COMMENT = '<string_literal>' ]
[ [ WITH ] ROW ACCESS POLICY <policy_name> ON ( <col_name> [ , <col_name> ... ] ) ]
[ [ WITH ] TAG ( <tag_name> = '<tag_value>' [ , <tag_name> = '<tag_value>' , ... ] ) ]
[ REQUIRE USER ]
AS <query>
Practically speaking, your create
statement may look like this:
create or replace dynamic table my_table
target_lag = '1 Day'
warehouse = 'TRANSFORMING'
refresh_mode = 'incremental'
as
select
customers.name,
count(*) as total_orders
from orders
inner join customers
using (customer_id)
group by 1
The refresh is initialized immediately, because the default value for intitalize
is on_create
.
Let’s dig in to the two unique arguments you will use the most: target_lag
and refresh_mode
Target Lag
Target lag is the maximum amount of time you want the data to lag behind changes in the source data. It is expressed as an integer and the datetime part (seconds, minutes, hours, days). For example, 5 minutes
is a valid target lag.
If you stack a dynamic table on top of another dynamic table, you can set target_lag = downstream
for all dynamic tables in the chain except for the last one, which should have a time schedule. Consider the example below:
In the image above, we have two dynamic tables chained together. The first one (orders_joined
) has a target lag of downstream
. The second one (orders_aggregated
) has a target lag of 1 day. In this case, the entire DAG will update once daily. You don’t need to worry about setting a schedule on each table.
Important Considerations with Target Lag
If the last dynamic table in your DAG has a target lag of
downstream
, your data will never refresh!Snowflake provides this callout in the docs: Target lag is not a guarantee. Instead, it is a target that Snowflake attempts to meet. Data in dynamic tables is refreshed as closely as possible within the target lag. However, target lag may be exceeded due to factors such as warehouse size, data size, query complexity, and similar factors.
Refresh Mode
The Refresh Mode can be auto
, full
, or incremental
. The default refresh mode is auto
which attempts to refresh incrementally and defaults to full if it cannot.
Dynamic Tables refresh incrementally based on changes
in the source data. You don’t need to provide any info about a primary key or how to check for changes, Snowflake does it all for you, like magic! For more info on Changes, check out the Snowflake Docs and our blog post on Streams.
Streams vs Dynamic Tables
It is worth mentioning that Dynamic Tables are related to Streams in Snowflake, as they use the same change tracking technology under the hood, changes
. In fact, these features are maintained by the same team at Snowflake!
Dynamic Tables are a declarative, user friendly way to build a pipeline.
Streams are imperative, and much more customization is involved to make the pipeline work.
Limitations of incremental refresh mode
A dynamic table cannot be refreshed incrementally if:
- An unsupported SQL function such as
current_timestamp
orrandom
is used. - An unsupported SQL construct is used:
pivot
,unpivot
,union
,minus
,intersect
,except
.union all
is supported for incremental! Except in some edge cases.
- Non identical
partion_by
clauses are used in various window functions. - More than 5% of the data is changed. This is a big one to watch out for!
- Subquery operators such as
in
,any
,all
,exists
are used.
There are several more limitations worth watching out for! Above, I only listed the limitations I think you will face most often. For a full explanation of Incremental Refresh limitations for Dynamic Tables, please read this page.
auto
refresh may be a nice choice… but…
If you successfully create a Dynamic Table using incremental
and one day Snowflake cannot refresh it incrementally due to one of the documented limitations, the refresh will fail. It will fail silently unless you’ve engineered some type of alerting mechanism to monitor the refreshes. To avoid this headache, you can use auto
to allow Snowflake to full refresh your table when needed. Then, monitor your dynamic tables to observe how Snowflake is handling your refreshes.
However, Snowflake does give a warning about using auto
refresh mode:
For consistent behavior, explicitly set the refresh mode on all production tables. The behavior of
AUTO
might change between Snowflake releases, which can cause unexpected changes in performance if used in production pipelines.
Monitoring Dynamic Tables
The Data tab in Snowsight
The easiest way to monitor Dynamic Tables is using the Snowsight UI.
In the Data tab on the left sidebar in Snowsight, navigate to your dynamic table. Click it, then click on the Refresh History tab.
A few super useful things to call out:
- The Lag Metrics section is quite handy. Here you can observe if the lag has ever exceeded the target lag, how fresh the current data is, and the max lag that ever existed on the table.
- In the refresh history data section, on the right hand side you will find a hyper link to the query profile for the refresh. (The lightning bolt inside a house icon). This is super useful when troubleshooting long refreshes!
For automated monitoring of refresh failures, see our articles on how to send alerts to Slack and Microsoft Teams.
The Monitoring tab in Snowsight
Not only can you monitor individual DAGs by selecting any dynamic table from the Data tab, you can also see the status of all dynamic tables in one place using the “Monitoring” tab.
Here, you can sort on “Refresh Status” to find failed refreshes.
Altering / updating a dynamic table
Snowflake provides several properties that can be updated via alter dynamic table
. Here is the exhaustive list from Snowflake’s documentation:
ALTER DYNAMIC TABLE [ IF EXISTS ] <name> { SUSPEND | RESUME }
ALTER DYNAMIC TABLE [ IF EXISTS ] <name> RENAME TO <new_name>
ALTER DYNAMIC TABLE [ IF EXISTS ] <name> SWAP WITH <target_dynamic_table_name>
ALTER DYNAMIC TABLE [ IF EXISTS ] <name> REFRESH [ COPY SESSION ]
ALTER DYNAMIC TABLE [ IF EXISTS ] <name> { clusteringAction }
ALTER DYNAMIC TABLE [ IF EXISTS ] <name> { tableColumnCommentAction }
ALTER DYNAMIC TABLE <name> { SET | UNSET } COMMENT = '<string_literal>'
ALTER DYNAMIC TABLE [ IF EXISTS ] <name> dataGovnPolicyTagAction
ALTER DYNAMIC TABLE [ IF EXISTS ] <name> searchOptimizationAction
ALTER DYNAMIC TABLE [ IF EXISTS ] <name> SET
[ TARGET_LAG = { '<num> { seconds | minutes | hours | days }' | DOWNSTREAM } ]
[ WAREHOUSE = <warehouse_name> ]
[ DATA_RETENTION_TIME_IN_DAYS = <integer> ]
[ MAX_DATA_EXTENSION_TIME_IN_DAYS = <integer> ]
[ DEFAULT_DDL_COLLATION = '<collation_specification>' ]
ALTER DYNAMIC TABLE [ IF EXISTS ] <name> UNSET
[ DATA_RETENTION_TIME_IN_DAYS ],
[ MAX_DATA_EXTENSION_TIME_IN_DAYS ],
[ DEFAULT_DDL_COLLATION ]
Examples: How to Alter a Dynamic Table
-- Suspend the dynamic table / turn of automatic updates
alter dynamic table my_dynamic_table suspend
-- turn auto updates back on
alter dynamic table my_dynamic_table resume
-- update target lag
alter dynamic table my_dynamic_table set target_lag = '2 Days'
-- change target_lag from time interval to downstream
alter dynamic table my_dynamic_table set target_lag = 'downstream'
-- change refresh_mode from incremental to full
alter dynamic table my_dynamic_table set refresh_mode = 'full'
-- manually trigger the refresh of a dynamic table right now
alter dynamic table my_dynamic_table refresh
Editing the SQL of a Dynamic Table
It is notable that you cannot add a column or change the SQL via alter
command; you must replace the table completely. This can be done by drop table my_dynamic_table
then recreating it. Or, you can use create or replace
in the DDL: create or replace dynamic table my_dynamic_table...
End to End Data Pipeline Example
Let’s create a simple example that everyone can follow along. For this example, we will be copying the sample data from the snowflake_sample_data
database. (Unfortunately you cannot clone or create dynamic tables on top of data from a share, so we will just duplicate the data, then create the dynamic tables).
Here is a picture of the DAG we are creating:
Setup
Let’s create some new objects so we are all working from the same starting point.
use role sysadmin;
create warehouse example_wh_xs
warehouse_size = xsmall
auto_suspend = 60
auto_resume = true;
use role securityadmin;
create role example_role;
grant role example_role to user jeff; -- swap in your user
grant all on warehouse example_wh_xs to role example_role;
use role sysadmin;
create database dynamic_demo;
grant ownership on database dynamic_demo to role example_role;
use role example_role;
create schema dynamic_demo.sample_data;
Now let’s copy some sample data into our new database. (Again, clone will not work!)
use schema dynamic_demo.sample_data;
create or replace table orders as
select * from
snowflake_sample_data.tpch_sf10.orders;
create or replace table customer as
select * from
snowflake_sample_data.tpch_sf10.customer;
Now let’s create two dynamic tables: a table to join our two new tables, and a table which aggregates the data. The first dynamic table in the DAG will have a target lag of downstream
. The second dynamic table will have a target lag of 1 day. This will control the lag for the entire DAG.
create or replace dynamic table obt_orders
target_lag = 'downstream'
warehouse = 'EXAMPLE_WH_XS'
refresh_mode = 'incremental'
as
select *
from orders join customer
on orders.o_custkey = customer.c_custkey;
create or replace dynamic table current_month_fulfillments
target_lag = '1 day'
warehouse = 'EXAMPLE_WH_XS'
refresh_mode = 'incremental'
as
with current_month as (
select date_trunc('month',max(o_orderdate)) as current_month_start from obt_orders
)
select
c_mktsegment as market_segment,
sum(o_totalprice) as order_amount
from obt_orders, current_month
where
o_orderstatus in ('F','P')
and date_trunc('month',o_orderdate) = current_month_start
group by 1
;
Change the source data, run the pipeline
The current_month_fulfillments
table should have no data, because none of the orders have F or P status for that month. (Check the where
clause on our query above).
Now let’s update the raw data and manually refresh the current_month_fulfillments
table:
update orders set o_orderstatus = 'F'
where date_trunc('month', o_orderdate) = '1998-08-01';
-- yes, the max order month in the data is august 1998!
-- 12,466 rows updated
-- manually refresh the last table in the DAG so we don't have to wait:
alter dynamic table current_month_fulfillments refresh;
We can see the first table refreshed automatically, because it was set to target_lag='downstream'
:
We see that 12.5K rows were deleted and re-inserted.
When we check the second table in the DAG, we can see that 5 rows were inserted:
Click on the “Graph” tab, which shows a great view of all of the tables in the DAG. The dynamic tables will show a status: “Succeeded” in green or “Failed” in red.
Clean up
drop database dynamic_demo;
use role sysadmin;
drop warehouse example_wh_xs;
use role securityadmin;
drop role example_role;
Dynamic Tables with dbt
Creating a dynamic table in dbt is as easy as adding this config to your normal .sql file:
{{ config(
materialized="dynamic_table",
on_configuration_change="apply" | "continue" | "fail",
target_lag="downstream" | "<integer> seconds | minutes | hours | days",
snowflake_warehouse="<warehouse-name>",
refresh_mode="AUTO" | "FULL" | "INCREMENTAL",
initialize="ON_CREATE" | "ON_SCHEDULE",
) }}
Or in a property file:
version: 2
models:
- name: [<model-name>]
config:
materialized: dynamic_table
on_configuration_change: apply | continue | fail
target_lag: downstream | <time-delta>
snowflake_warehouse: <warehouse-name>
refresh_mode: AUTO | FULL | INCREMENTAL
initialize: ON_CREATE | ON_SCHEDULE
How it works with dbt
When dbt runs the first time, it will create the dynamic table. On subsequent runs, it will see the table already exists and skip it. The table is only refreshed by the target_lag
, not by running dbt.
As mentioned earlier, Snowflake does not support changing the SQL for a dynamic table. Therefore, any change to your model definition requires a --full-refresh
.
Running dbt with --full-refresh
will drop and re-create the dynamic table.
Comparing Dynamic Tables in dbt vs dbt Incremental
Dynamic Tables:
- Refresh mechanism is managed by Snowflake, not dbt.
- Declarative: you only define the select statement, not the incremental logic.
Incremental Models:
- Refresh is managed by dbt, or whatever is orchestrating dbt.
- Imperative: you must define custom incremental logic.
- Use when you need finer control over how a table is incrementally refreshed.
Limitations of Dynamic Tables
Dynamic tables have quite a few limitations. For a full list of limitations, I suggest referring to the docs. But here are the limitations I think you are most likely to face:
- Dynamic tables cannot be downstream from materialized views, external tables or streams.
- You can’t create a temporary dynamic table.
- The Snowflake documentation mentions you can’t truncate a dynamic table. But they miss mentioning that you cannot do any DML operation. Insert, Update, Delete will all fail on a dynamic table. This makes sense, as a dynamic table must track the underlying sources and sql definition.
- You can’t set the DATA_RETENTION_TIME_IN_DAYS parameter in your source tables to zero. This is because
changes
in Snowflake leverages time travel. Time travel must be enabled. - The target lag must be less than the
data_retention_time_in_days
of the upstream tables. - You can’t use dynamic SQL (session variables) in dynamic tables.
- Operations on dynamic tables are not captured by Snowflake’s
access_history
view. - You can’t use sequences. For example, the SQL definition of the dynamic table cannot contain:
select my_sequence.nextval
- You can’t use
sample
ortablesample
in the dynamic table definition. - Cloned incremental dynamic tables may full-refresh when initialized.
Best practices for Dynamic tables
Here are my top recommendations to consider when using Dynamic Tables:
- Have the longest target lag possible for your use case. This will help you reduce compute costs by minimizing the amount of times your tables will re-compute (refresh).
- Chaining dynamic tables is encouraged. This allows you to create pipelines that are made up entirely of dynamic tables and views, allowing Snowflake to manage the refresh.
- Set
downstream
target lag on all tables except for the last table in the DAG.- If you have several leaf nodes, you can use a controller table to maintain target lag time (and other properties) in a single table for your account. Example here.
- Use transient dynamic tables to reduce storage cost.
- Use higher time travel on the sources.
- Additional best practices from the Snowflake team can be found here.
Dynamic Tables Pricing
With dynamic tables, you are charged for three main things:
- The compute costs associated with refreshing the tables
- The storage costs of the dynamic tables themselves
- Cloud services costs associated with refreshes, only if they exceed 10% of your daily compute costs
How to Monitor Dynamic Table Costs?
Snowflake recommends to use a dedicated warehouse to monitor cost for dynamic tables; however, I would recommend not proliferating warehouses.
Instead, you can use a tool like SELECT that will automatically show you the cost of each dynamic table and how that is changing over time.
You can also see the cost of your entire dynamic table DAG, which can often let you spot bigger issues like things running to frequently.
Wrap Up
Dynamic tables are a great tool to have in your data pipeline toolbelt. The automated refreshes and simplicity of not needing to specify incremental logic make dynamic tables a very attractive option. Just be aware of the many limitations of dynamic tables, especially the limitations we discussed around incremental refresh, which are significant.
Hopefully you now feel ready to use dynamic tables in your data pipelines! I’d love to hear about your experiences with dynamic tables.