Essential Snowflake Cost Reduction Strategies
- Niall Woodward
Snowflake is an incredibly powerful platform, easily scaling to handle ever-larger data volumes without compromising on performance. But, if not controlled, costs associated with this scaling quickly climb. Whether your goal is to reduce the price of an upcoming renewal, extend your existing contract's runway, or reduce on-demand costs, use the strategies in this post to make significant savings.
The cost reduction techniques in this post fall into three categories:
- Virtual warehouse cost optimization
- Workload configuration
- Reducing billable storage
Let's get straight into it.
Virtual warehouse cost optimization
Reduce virtual warehouse auto-suspend thresholds
Use 60 second auto-suspend timeouts for all virtual warehouses. The only place where this recommendation can differ is for user-facing workloads where low latency is paramount, and the warehouse cache is frequently being used. If you’re not sure, go for 60 seconds, and increase it if performance suffers.
Auto-suspend settings have a big impact on the bill because Snowflake charges for every second a virtual warehouse is running, with a minimum of 60 seconds. For this reason, we don't recommend setting auto-suspend below 60 seconds, as it can lead to double charges. Auto-suspend settings over 60 seconds result in virtual warehouses being billable while processing no queries. By default, all virtual warehouses created via the user interface have auto-suspend periods of 10 minutes, so be careful when creating new warehouses, too.
Reduce virtual warehouse size
Virtual warehouse computation resources and cost scale exponentially. Here’s a quick reminder, with costs displayed per hour as credits (dollars) assuming a typical rate of $2.5 per credit.
|Hourly virtual warehouse pricing|
|Cost||1 ($2.50)||2 ($5)||4 ($10)||8 ($20)||16 ($40)|
|Cost||32 ($80)||64 ($160)||128 ($320)||256 ($640)||512 ($1280)|
Here’s the monthly pricing for each warehouse assuming running continuously (though not typically how warehouses run due to auto-suspend, it gives a better sense of cost than hourly):
|Monthly virtual warehouse pricing|
|Cost||720 ($1,800)||1,440 ($3,600)||2,880 ($7,200)||5,760 ($14,400)||11,520 ($28,800)|
|Cost||23,040 ($57,600)||46,080 ($115,200)||92,160 ($230,400)||184,320 ($460,800)||368,640 ($921,600)|
Over-sized warehouses can sometimes make up the majority of Snowflake usage. Reduce warehouse sizes and observe the impact on workloads. If performance is still acceptable, try reducing size again. Check out our full guide to choosing the right warehouse size in Snowflake.
Ensure minimum clusters are set to 1
Snowflake Enterprise editions or higher provide multi-cluster warehousing, allowing warehouses to add additional clusters in parallel to handle increased demand. The minimum cluster count setting should always be set to 1 to avoid over-provisioning. Snowflake will automatically add clusters up to the maximum cluster count with minimal provisioning time, as needed. Minimum cluster counts higher than 1 lead to unused and billable clusters.
Last but perhaps the most significant, consolidate virtual warehouses. The enemy of cost efficiency is wastage. The best way to ensure virtual warehouses are being utilized efficiently is to use as few as possible. Where needed, create separate warehouses based on performance requirements versus domains of workload.
For example, creating one warehouse for all data loading, one for transformations, and one for live BI querying will lead to better cost efficiency than one warehouse for marketing data and one for finance data. All data-loading workloads typically have the same performance requirements (tolerate some queueing) and can often share a multi-cluster X-Small warehouse. In contrast, all live, user-facing queries may benefit from a larger warehouse to reduce latency.
Where workloads within each category (loading, transformation, live querying, etc.) need a larger warehouse size for acceptable query speeds, create a new larger warehouse just for those. For best cost efficiency, queries should always run on the smallest warehouse they perform sufficiently quickly on.
After consolidation, users of SELECT can monitor cost per user across all warehouses where domain-grouped spend reporting is needed.
Now your virtual warehouses are configured to run cost-efficiently, it’s time to look at at the queries running within them. The biggest source of computation costs in Snowflake are typically batch data transformation jobs. There are two main strategies to improve the cost efficiency of these: reduce frequency and incrementalize.
At many organizations, batch data transformation jobs often run hourly by default. But do downstream use cases need such low latency? Here are some examples of how reducing run frequency can have an immediate impact on cost. In this example, we assume all workloads are non-incremental and so perform a full data refresh each run, and that the initial cost of running hourly was $100,000.
|Run frequency||Annual cost||Saving|
|Hourly on weekdays, daily at the weekend||$75,000||$25,000|
|Hourly during working hours||$50,000||$50,000|
|Once at the start of the working day, once around midday||$8000||$92,000|
A challenge here is knowing which workloads can be reduced in frequency. SELECT identifies where transformations refresh at a higher frequency than their downstream queries, automatically recommending schedule changes and deletions for unused workloads.
Only process new or updated data
Significant volumes of our data are often immutable, meaning they don’t change once created. Examples of these include web and shipment events. Some do change, but rarely beyond a certain time interval, such as orders which are unlikely to be altered beyond a month after purchase.
Rather than reprocessing all data in every batch data transformation job, incrementalization can be used to filter for only records which are new or updated within a certain time window, perform the transformations, and then insert or update that data into the final table.
For a year-old events table, the impact of switching to incremental, insert-only transformations for new records only could reduce costs by 99%. For a year-old orders table, re-processing and updating only the last month’s and new orders could reduce costs by 90% compared with full refreshes.
Reducing billable storage
While typically only a small (<20%) component of the total Snowflake bill, unused tables and time-travel backups can eat away at credits. If you’re using our dbt-snowflake-monitoring package, you can quickly identify all unused tables to drop with this example. If not, use the TABLE_STORAGE_METRICS view to rank order by
TOTAL_BILLABLE_BYTES to find the costliest tables.
selecttable_catalog as database_name,table_schema as schema_name,table_name,(active_bytes + time_travel_bytes + failsafe_bytes + retained_for_clone_bytes) as total_billable_bytesfrom snowflake.account_usage.table_storage_metricsorder by total_billable_bytes desclimit 10
We hope you found these recommendations helpful. If you're looking to reduce your Snowflake spend, put guardrails in place and get better visibility into costs, use SELECT. SELECT is a Snowflake cost optimization product designed to give users complete control of their Snowflake costs. SELECT instantly and automatically reduces spend by as much as 30% in addition to user-actioned cost optimization recommendations. SELECT alerts you to any cost anomalies in your account for total peace of mind. If you’re interested in learning more, reach out for early access.