Exclude and rename columns when using SELECT * in Snowflake
- Date
- Ian WhitestoneCo-founder & CEO of SELECT
Why do we need EXCLUDE & RENAME?
In Snowflake's November 2022 release, they quietly announced some exciting new SQL syntax, EXCLUDE
and RENAME
, which allows users to remove and rename specific columns when running a SELECT * style query. This is particularly exciting, since it is very common when operating in non-production SQL workflows to run SELECT *
style exploratory queries like:
select *
from table
where id = 5
Users often do this in order to investigate a particular record or subset of records. But what happens when you want to exclude a specific column (like a large text column), or rename one of them? Users are forced to type out all of the columns they want, along with any desired renaming using the traditional AS
syntax:
select
column_1,
column_2,
column_3 as column_3_renamed,
column_4,
column_5,
column_6,
column_7,
column_8,
column_9, -- columns 10 and 11 are intentionally left out
column_12
from table
where id = 5
Not the best when you're trying to get answers, quickly.
How to exclude columns when running a SELECT * SQL statement in Snowflake?
Instead of typing out each column, a common ask
1 from users is the ability to specify a subset of columns to exclude from the table. In many cases, this is much more efficient to write since the number of columns to be excluded is often much lower than the number to be included. As of Snowflake's version 6.37 release, you can now include an EXCLUDE
clause in SELECT *
style SQL queries where you specify one or more columns that you don't want returned from the table:
select
*
exclude(column_10, column_11)
from table
where id = 5
This can also be used to exclude a single column:
select
*
exclude (column_10)
from table
where id = 5
Brackets are optional when excluding a single column. The SQL above can be rewritten as:
select
*
exclude column_10
from table
where id = 5
Using Snowflake's EXCLUDE with multiple tables
Another common need for excluding columns is when joining multiple tables. Imagine you have a query like this 2:
select *
from orders
join customers
on orders.customer_id=customers.customer_id
join items
on orders.order_id=items.order_id
In this example, we have two order_id
columns and two customer_id
columns in the output, since they are present in multiple tables. We can easily exclude these by changing our SQL to:
select
orders.*,
customers.* exclude customer_id,
items.* exclude order_id
from orders
join customers
on orders.customer_id=customers.customer_id
join items
on orders.order_id=items.order_id
Snowflake's equivalent of BigQuery's & Databrick's EXCEPT
Snowflake's EXCLUDE
syntax is similar to BigQuery's and Databrick's EXCEPT
syntax:
-- Snowflake
select
*
exclude(column_10, column_11)
from table
-- BigQuery/Databricks
select
*
except(column_10, column_11)
from table
For anyone familiar with DuckDB, Snowflake follows the same EXCLUDE
syntax that they use. Both databases presumably chose to avoid using the EXCEPT
keyword to provide this functionality, since it is already used in set operations.
Should you use EXCLUDE in production SQL code?
While this new syntax is great for speeding up adhoc workflows, it is not recommended for production SQL code. It's better to be explicit about the exact columns you are selecting. Writing out all column names improves readability and auditability. Users reading your SQL code can immediately understand which columns you are using from which table. It can also lead to clearer errors. If a column is suddenly removed from a table, your SQL code will immediately fail. This is much better than a downstream application suddenly erroring due to a missing column.
Does using EXCLUDE in Snowflake result in a different query plan or performance?
No. Using the EXCLUDE
keyword to remove a few columns does not result in a different query plan or query performance compared to explicitly writing out all columns. You can validate this behaviour by running both types of queries on a dataset in your Snowflake account and inspecting the query profile for each query.
How to rename columns when running a SELECT * SQL statement in Snowflake?
Prior to the new RENAME
functionality, users were forced to write out every column name as soon as they needed to rename at least one when running a SELECT *
query. Now, users can easily select all columns while renaming a subset:
select
*
rename (column_3 as column_3_renamed, column_5 as column_5_renamed)
from table
where id = 5
This is much better than having having to list out every field just to rename one or two:
-- old method 👎
select
column_1,
column_2,
column_3 as column_3_renamed,
column_4,
column_5 as column_5_renamed,
column_6,
...
column_12
from table
where id = 5
Similar to EXCLUDE
, this can also be done for a single column, brackets optional:
select
*
rename (column_3 as column_3_renamed)
from table
where id = 5
Combining EXCLUDE and RENAME
EXCLUDE
and RENAME
can easily be combined, as shown in the example below:
select
*
exclude(column_10, column_11)
rename column_3 as column_3_renamed
from table
where id = 5
Much better than the code from the beginning of the blog post:
select
column_1,
column_2,
column_3 as column_3_renamed,
column_4,
column_5,
column_6,
column_7,
column_8,
column_9, -- columns 10 and 11 are intentionally left out
column_12
from table
where id = 5
Notes
1 To give you a sense of demand, this stack overflow question, asked almost 14 years ago, has 1.3 million views and over 1000 upvotes. ↩
2 Shoutout to Nate Sooter for motivating this example with his recent tweet! ↩