The qualify clause in Snowflake, or generally in SQL, enables filtering based on the result of a window function. A window function cannot be used in the where
clause directly, so use qualify
instead when you need to filter on window functions.
A window function is a function that operates on a set of rows in relation to the current row. They are often used to calculate ranks, running totals, moving averages, count by partitions, etc. Whenever you need to reference the rows above or below the current row, you need a window function. And the qualify clause is just a shortcut to filtering on a window function.
The syntax for qualify
is as follows:
SELECT <column_list>
FROM <data_source>
where <conditions>
[GROUP BY ...]
[HAVING ...]
QUALIFY <predicate>
ORDER BY
You can see that qualify
comes at the end of the SQL query, right before the optional order by
clause.
Based on the data below, let’s try to answer two business questions.
select
salesperson_id,
region,
sales_amount,
sales_date
from sales
qualify row_number() over (partition by region, sales_date order by sales_amount desc) = 1
Snowflake will return the highlighted rows:
2. Who is the top sales rep for each day?
select
salesperson_id,
region,
sales_amount,
sales_date
from sales
qualify row_number() over (partition by sales_date order by sales_amount desc) = 1
Snowflake will return the highlighted rows:
Quick tip: Alias reuse works with the qualify clause! If you have the need to actually select the window function and not just filter on it, you can re-use the window function alias.
select
salesperson_id,
region,
sales_amount,
sales_date,
row_number() over (partition by sales_date order by sales_amount desc) sales_date_rank
from sales
qualify sales_date_rank = 1
Using the qualify
clause is the fastest way to find which records are causing a unique test in dbt to fail, or isolate duplicates based on a combination of columns.
Let’s say you have a column called id
that is tested for uniqueness and that test fails. You want analyze all of the duplicate records, with the duplicates sorted together.
The fastest way to select all of the records from the failed test is this:
select *
from my_table
qualify count(*) over (partition by id) > 1
order by id
Let’s look at alternative ways to solve the duplicate records query.
Option 1:
-- use CTE with Group By instead
with duplicates as
(
select id
group by 1
having count(*) > 1
)
select *
from
my_table join
duplicates using (id);
Option 2
-- use CTE to create a window function, then filter on the window function
with window_count as
(
select *, count(*) over (partition by id) as id_count
from my_table
)
select * exclude (id_count)
from window_count
where id_count > 1;
There are a few more ways we can solve this same question, but none of them are as quick and easy as using the qualify
clause.
Key Takeaway: The main point of the qualify
clause is it eliminates the need for a CTE when filtering on a window function. Since window functions cannot be used directly in the where
clause, they would require a CTE for filtering if the qualify
clause did not exist.
We have already covered that qualify
filters a window function and cannot be used in the where clause, but let’s dive deeper into the SQL order of operations.
This is the order in which Snowflake will process your select
statement:
It is important to note that any filtering in the from
and where
clause happen before any filtering in qualify
. Also, any aggregation and aggregate filtering (group by
and having
) come before this as well.
Once again, the qualify
clause in Snowflake is really just a shortcut that eliminates CTEs and sub-queries when you want to filter on a window function. I use this almost every day in my daily work for analyzing duplicates, removing duplicates (qualify row_number() … = 1
), or advanced analytics scenarios.
Jeff Skoldberg is a Sales Engineer at SELECT, helping customers get maximum value out of the SELECT app to reduce their Snowflake spend. Prior to joining SELECT, Jeff was a Data and Analytics Consultant with 15+ years experience in automating insights and using data to control business processes. From a technology standpoint, he specializes in Snowflake + dbt + Tableau. From a business topic standpoint, he has experience in Public Utility, Clinical Trials, Publishing, CPG, and Manufacturing.
Get up and running with SELECT in 15 minutes.
Gain visibility into Snowflake usage, optimize performance and automate savings with the click of a button.