60x faster database clones in Snowflake

Date
  • Niall Woodward
    Niall Woodward
    Co-founder & CTO of SELECT

Snowflake's zero-copy cloning feature is extremely powerful for quickly creating production replica environments. But, anyone who has cloned a database or schema with a large number of tables has experienced that it can take over ten minutes to complete. In this post we explore a potential solution.

  • 1
  • 22m 34s
  • 0.179 {% /table %}

Experiment 1 - Schema level clones

import snowflake.connector from snowflake.connector import DictCursor def clone_database_by_schema(con, source_database, target_database): con.cursor().execute(f"create database {target_database};") cursor = con.cursor(DictCursor) cursor.execute(f"show schemas in database {source_database};") for i in cursor.fetchall(): if i["name"] not in ("INFORMATION_SCHEMA", "PUBLIC"): con.cursor().execute_async(f"create schema {target_database}.{i['name']} clone {source_database}.{i['name']};") con = snowflake.connector.connect( ... session_parameters={ 'QUERY_TAG': 'test 2', } ) clone_database_by_schema("test", "test_2")

Results: {% table %}

  • Query count
  • Duration
  • Cloud services credits
  • 1012
  • 22s
  • 0.165 {% /table %}

Using 10 threads, the time between the create database command starting and the final create table ... clone command completing was only 22 seconds. This is 60x faster than the create database ... clone command. The bottleneck is still the rate at which queries can be dispatched.

In Summary

The complete results:

{% table %}

  • Clone strategy
  • Query count
  • End-to-end duration
  • Cloud services credits
  • Experiment 1 - Schema level clones
  • 12
  • 1m 47s
  • 0.148
  • Experiment 2 - Table level clones
  • 1012
  • 22s
  • 0.165 {% /table %}

All the queries ran were cloud services only, and did not require a running warehouse or resume a suspended one.

I hope that Snowflake improves their schema and database clone functionality, but in the mean time, cloning tables seems to be the way to go.

Thanks again to Felipe Leite and Stephen Pastan of Miro for sharing this!

Niall Woodward
Niall Woodward
Co-founder & CTO of SELECT

Niall is the Co-Founder & CTO of SELECT, a SaaS Snowflake cost management and optimization platform. Prior to starting SELECT, Niall was a data engineer at Brooklyn Data Company and several startups. As an open-source enthusiast, he's also a maintainer of SQLFluff, and creator of three dbt packages: dbt_artifacts, dbt_snowflake_monitoring and dbt_snowflake_query_tags.

Get up and running with SELECT in 15 minutes.

Snowflake optimization & cost management platform

Gain visibility into Snowflake usage, optimize performance and automate savings with the click of a button.

SELECT web application screenshot