Understand and optimize total asset cost with SELECT’s data lineage
- Date
- Greg FinleySoftware Engineer at SELECT
- Adam WisniewskiSoftware Engineer at SELECT
- Niall WoodwardCo-founder & CTO of SELECT
We’re excited to announce the release of data lineage to help users understand their total data asset cost, ensure ROI from their data products and ultimately optimize costs.
The need for data lineage in cost management
Building data products, whether that be a dashboard, machine learning model or dataset you sell, is a complex process involving many different components. Data is usually loaded into a staging area then multiple stages of data transformations can be performed before creating a final dataset that can be surfaced to end users.
If we consider a single BI dashboard built for a business user, the total cost of that asset can involve:
- Data loading costs into Snowflake from a service like Snowpipe or self-managed compute
- Storage costs for the raw data that has been loaded
- Compute costs for each Dynamic Table, dbt model, Task or Stored Procedure used to process that data
- Storage costs from each of the intermediate datasets produced in the produce
- Additional costs from Failsafe and Time Travel backup storage.
- Compute costs from data quality checks being run on each table
- Additional costs from automatic clustering or search optimization enabled on any tables to improve performance
- Compute costs from the final queries issued by that BI dashboard.
With SELECT’s data lineage feature, we’re incredibly excited to give users the ability to see these end to end costs in one place.
When looking at costs from tables and queries in isolation, opportunities to save costs from removing unnecessary resources are often disregarded since the cost savings from those individual resources alone are often not significant.
With SELECT’s data lineage feature, users can now easily see the total cost of all these interconnected processes and spot new optimization opportunities like:
- Removing an entire group of related resources which aren’t driving enough business value relative to their combined cost
- Identifying resources that are updating more frequently than required by downstream users
- Spotting unnecessary costs they were not aware about.
Additional Benefits
Outside of understanding and optimizing total data asset cost, there are a number of other helpful benefits users can derive from the feature:
- Easily understand the upstream and downstream dependencies of a given Snowflake resource
- Visualize all processes updating and reading from a table
- Quickly drill into table and workload statistics from the resource sidebar without switching between pages
Lineage is not available to Snowflake Standard Edition Customers
Our lineage feature relies on the Snowflake access history view which is not available for customers on the Standard Edition of Snowflake.
Head to our lineage documentation to learn more.
Other Things We shipped
- 🐛 Fixed a bug causing the dbt latest run details not displaying all steps
- 🐛 Resolved a bug preventing some costs values not appearing properly in Monitors configuration
- 🪄 Allow users to configure digest monitors to only send if results are returned
- 🪄 Pick your dataset before you pick your monitor type in monitors
- 🪄 Users can now preview monitors before selecting a destination
- 🪄 Added the ability to exclude the spend summary paragraph included in monitor messages