Using SELECT
Lineage
SELECT’s end-to-end data lineage feature helps users:
- Easily understand the upstream and downstream dependencies of a given Snowflake resource
- Quickly understand resource usage and costs from the Lineage sidebar
- Understand total data asset costs & identify optimization opportunities
Understand data dependencies
SELECT’s lineage feature can be accessed from any workload or table page in the UI. By default, SELECT will load all direct upstream and downstream dependencies.
Users can expand the lineage graph by clicking the buttons on each node.
SELECT’s lineage features shows both datasets (tables) and workloads (queries, dbt models, tasks, etc.). Tables are shown as rectangles, whereas workloads are shown as ovals.
If you just want to analyze data dependencies and not see workload details, you can select the “Collapse Workloads” option to minimize the workload nodes.
Understand resource details with the sidebar
Users can quickly get more context without navigating away from the lineage view by clicking on any of the nodes. A sidebar will automatically open containing information about the asset, its cost trends, and associated users.
Understand total data asset cost
When looking at costs from tables and queries in isolation, opportunities to save costs from removing unnecessary resources are often disregarded since the cost savings from those individual resources alone are often not significant.
With SELECT’s data lineage feature, users can now easily see the total cost of all these interconnected processes:
Using this view, users can spot new optimization opportunities like:
- Removing an entire group of related resources which aren’t driving enough business value relative to their combined cost
- Identifying resources that are updating more frequently than required by downstream users
- Spotting unnecessary costs they were not aware about.
Refine the lineage view
Users can refine the lineage view to their needs using the following options:
- Collapse the rounded workloads nodes to focus purely on the data dependencies and flows
- View lineage at a different point in time using the date range selector
- Exclude development databases/schemas or workloads run by certain users using the available filters
- Hide the cost statistics shown on each node for a simplified dependency view
Limitations
There are currently a few scenarios where data lineage will not be generated for a given resource in SELECT:
- If there are no downstream/upstream dependencies for a given workload. Examples of this would be a query which writes data to an external stage.
- Snowpipe, Dynamic Tables, Streams and Stages are not covered in the initial release of SELECT lineage. As a result, you will not see lineage for these resources or see them included on the lineage for other resources which rely on them.
Lineage is not Snowflake Standard Edition Customers
Our lineage feature relies on the Snowflake access history view which is not available for customers on the Standard Edition of Snowflake.