Niall WoodwardSaturday, October 22, 2022
I had the pleasure of attending dbt’s Coalesce conference in London last week, and dropped into a really great talk by Felipe Leite and Stephen Pastan of Miro. They mentioned how they’d achieved a considerable speed improvement by switching database clones out for multiple table clones. I had to check it out.
Results were collected using the following query which measures the duration of each process by passing in the earliest query start time and last query end time to the DATEDIFF function:
Create a database with 10 schemas, 100 tables in each:
This operation took 22m 34s to execute.
Results:
Query Count | 1 |
---|---|
Duration | 22m 34s |
Cloud services credits | 0.179 |
Results:
Query Count | 12 |
---|---|
Duration | 1m 47s |
Cloud services credits | 0.148 |
Using execute_async
executes each SQL statement without waiting for each to complete, resulting in all 10 schemas being cloned concurrently. A whopping 10x faster from start to finish compared with the regular database clone.
This took 1 minute 48s to complete, the limiting factor being the rate at which the queries could be dispatched by the client (likely due to network waiting times). To help mitigate that, I distributed the commands across 10 threads:
Results:
Query Count | 1012 |
---|---|
Duration | 22s |
Cloud services credits | 0.165 |
Using 10 threads, the time between the create database
command starting and the final create table ... clone
command completing was only 22 seconds. This is 60x faster than the create database ... clone
command. The bottleneck is still the rate at which queries can be dispatched.
The complete results:
Clone Strategy | Control - Database clone | Experiment 1 - Schema level clones | Experiment 2 - Table level clones |
---|---|---|---|
Query count | 1 | 12 | 1012 |
Duration | 22m 34s | 1m 47s | 22s |
Cloud services credits | 0.179 | 0.148 | 0.165 |
All the queries ran were cloud services only, and did not require a running warehouse or resume a suspended one.
I hope that Snowflake improves their schema and database clone functionality, but in the mean time, cloning tables seems to be the way to go.
Thanks again to Felipe Leite and Stephen Pastan of Miro for sharing this!
Niall is the Co-Founder & CTO of SELECT, a SaaS Snowflake cost management and optimization platform. Prior to starting SELECT, Niall was a data engineer at Brooklyn Data Company and several startups. As an open-source enthusiast, he's also a maintainer of SQLFluff, and creator of three dbt packages: dbt_artifacts
, dbt_snowflake_monitoring
and dbt_snowflake_query_tags
.
Want to hear about our latest Snowflake learnings?Subscribe to get notified.
Connect your Snowflake account and instantly understand your savings potential.