All posts

Effective Filtering of Window Functions With Snowflake QUALIFY

Date
  • Jeff Skoldberg
    Principal Consultant at Green Mountain Data Solutions

What is Qualify in Snowflake?

The qualify clause in Snowflake, or generally in SQL, enables filtering based on the result of a window function. A window function cannot be used in the where clause directly, so use qualify instead when you need to filter on window functions.

Quick Refresher on Window Functions

A window function is a function that operates on a set of rows in relation to the current row. They are often used to calculate ranks, running totals, moving averages, count by partitions, etc. Whenever you need to reference the rows above or below the current row, you need a window function. And the qualify clause is just a shortcut to filtering on a window function.

Qualify Syntax

The syntax for qualify is as follows:

SELECT <column_list>
FROM <data_source>
where <conditions>
[GROUP BY ...]
[HAVING ...]
QUALIFY <predicate>
ORDER BY

You can see that qualify comes at the end of the SQL query, right before the optional order by clause.

Example 1: Finding the Top Sales Rep

Based on the data below, let’s try to answer two business questions.

Finding the Top Sales Rep
  1. Who is the top sales rep in each region each day?
select
salesperson_id,
region,
sales_amount,
sales_date
from sales
qualify row_number() over (partition by region, sales_date order by sales_amount desc) = 1

Snowflake will return the highlighted rows:

Finding the Top Sales Rep
  1. Who is the top sales rep for each day?
select
salesperson_id,
region,
sales_amount,
sales_date
from sales
qualify row_number() over (partition by sales_date order by sales_amount desc) = 1

Snowflake will return the highlighted rows:

Finding the Top Sales Rep

Quick tip: Alias reuse works with the qualify clause! If you have the need to actually select the window function and not just filter on it, you can re-use the window function alias.

select
salesperson_id,
region,
sales_amount,
sales_date,
row_number() over (partition by sales_date order by sales_amount desc) sales_date_rank
from sales
qualify sales_date_rank = 1

Example 2: Finding Duplicate Records

Using the qualify clause is the fastest way to find which records are causing a unique test in dbt to fail, or isolate duplicates based on a combination of columns.

Let’s say you have a column called id that is tested for uniqueness and that test fails. You want analyze all of the duplicate records, with the duplicates sorted together.

The fastest way to select all of the records from the failed test is this:

select *
from my_table
qualify count(*) over (partition by id) > 1
order by id

Why is Qualify Useful? It is a Shortcut!

Let’s look at alternative ways to solve the duplicate records query.

Option 1:

-- use CTE with Group By instead
with duplicates as
(
select id
group by 1
having count(*) > 1
)
select *
from
my_table join
duplicates using (id);

Option 2

-- use CTE to create a window function, then fitler on the window function
with window_count as
(
select *, count(*) over (partition by id) as id_count
from my_table
)
select * exclude (id_count)
from window_count
where id_count > 1;

There are a few more ways we can solve this same question, but none of them are as quick and easy as using the qualify clause.

Key Takeaway: The main point of the qualify clause is it eliminates the need for a CTE when filtering on a window function. Since window functions cannot be used directly in the where clause, they would require a CTE for filtering if the qualify clause did not exist.

What is the difference between WHERE and Qualify in Snowflake?

We have already covered that qualify filters a window function and cannot be used in the where clause, but let’s dive deeper into the SQL order of operations.

This is the order in which Snowflake will process your select statement:

Finding the Top Sales Rep

It is important to note that any filtering in the from and where clause happen before any filtering in qualify. Also, any aggregation and aggregate filtering (group by and having) come before this as well.

Wrapping Up

Once again, the qualify clause in Snowflake is really just a shortcut that eliminates CTEs and sub-queries when you want to filter on a window function. I use this almost every day in my daily work for analyzing duplicates, removing duplicates (qualify row_number() … = 1 ), or advanced analytics scenarios.

Jeff Skoldberg
Principal Consultant at Green Mountain Data Solutions
Jeff Skoldberg is a Data and Analytics Consultant with 15+ years experience in automating insights and using data to control business processes. From a technology standpoint, he specializes in Snowflake + dbt + Tableau. From a business topic standpoint, he has experience in Public Utility, Clinical Trials, Publishing, CPG, and Manufacturing. Jeff adds unique value for his supply chain clients, as he is well versed in all topics related to planning, forecasting, inventory and supply chain KPIs. Reach out any time! [email protected]

Get up and running with SELECT in 15 minutes.

Snowflake optimization & cost management platform

Gain visibility into Snowflake usage, optimize performance and automate savings with the click of a button.

SELECT web application screenshot

Want to hear about our latest Snowflake learnings? 
Subscribe to get notified.