Exclude and rename columns when using SELECT * in Snowflake

Date: Sunday, December 18, 2022

Ian Whitestone
Co-founder & CEO of SELECT

Why do we need EXCLUDE & RENAME?

In Snowflake's November 2022 release, they quietly announced some exciting new SQL syntax, EXCLUDE and RENAME, which allows users to remove and rename specific columns when running a SELECT style query. This is particularly exciting, since it is very common when operating in non-production SQL workflows to run `SELECT ` style exploratory queries like:

1select *
2from table
3where id = 5

Users often do this in order to investigate a particular record or subset of records. But what happens when you want to exclude a specific column (like a large text column), or rename one of them? Users are forced to type out all of the columns they want, along with any desired renaming using the traditional AS syntax:

1select
2  column_1,
3  column_2,
4  column_3 as column_3_renamed,
5  column_4,
6  column_5,
7  column_6,
8  column_7,
9  column_8,
10  column_9, -- columns 10 and 11 are intentionally left out
11  column_12
12from table
13where id = 5

Not the best when you're trying to get answers, quickly.

How to exclude columns when running a SELECT * SQL statement in Snowflake?

Instead of typing out each column, a common ask¹ from users is the ability to specify a subset of columns to exclude from the table. In many cases, this is much more efficient to write since the number of columns to be excluded is often much lower than the number to be included. As of Snowflake's version 6.37 release, you can now include an EXCLUDE clause in SELECT * style SQL queries where you specify one or more columns that you don't want returned from the table:

1select
2  *
3  exclude(column_10, column_11)
4from table
5where id = 5

This can also be used to exclude a single column:

1select
2  *
3  exclude (column_10)
4from table
5where id = 5

Brackets are optional when excluding a single column. The SQL above can be rewritten as:

1select
2  *
3  exclude column_10
4from table
5where id = 5

Using Snowflake's EXCLUDE with multiple tables

Another common need for excluding columns is when joining multiple tables. Imagine you have a query like this²:

1select
2  orders.*,
3  customers.* exclude customer_id,
4  items.* exclude order_id
5from orders
6join customers
7  on orders.customer_id=customers.customer_id
8join items
9  on orders.order_id=items.order_id

In this example, we have two order_id columns and two customer_id columns in the output, since they are present in multiple tables. We can easily exclude these by changing our SQL to:

1select
2  orders.*,
3  customers.* exclude customer_id,
4  items.* exclude order_id
5from orders
6join customers
7  on orders.customer_id=customers.customer_id
8join items
9  on orders.order_id=items.order_id

Snowflake's equivalent of BigQuery's & Databrick's EXCEPT

Snowflake's EXCLUDE syntax is similar to BigQuery's and Databrick'sEXCEPT syntax:

1-- Snowflake
2select
3  *
4  exclude(column_10, column_11)
5from table
6
7-- BigQuery/Databricks
8select
9  *
10  except(column_10, column_11)
11from table

For anyone familiar with DuckDB, Snowflake follows the same EXCLUDE syntax that they use. Both databases presumably chose to avoid using the EXCEPT keyword to provide this functionality, since it is already used in set operations.

Should you use EXCLUDE in production SQL code?

While this new syntax is great for speeding up adhoc workflows, it is not recommended for production SQL code. It's better to be explicit about the exact columns you are selecting. Writing out all column names improves readability and auditability. Users reading your SQL code can immediately understand which columns you are using from which table. It can also lead to clearer errors. If a column is suddenly removed from a table, your SQL code will immediately fail. This is much better than a downstream application suddenly erroring due to a missing column.

Does using EXCLUDE in Snowflake result in a different query plan or performance?

No. Using the EXCLUDE keyword to remove a few columns does not result in a different query plan or query performance compared to explicitly writing out all columns. You can validate this behaviour by running both types of queries on a dataset in your Snowflake account and inspecting the query profile for each query.

How to rename columns when running a SELECT * SQL statement in Snowflake?

Prior to the new RENAME functionality, users were forced to write out every column name as soon as they needed to rename at least one when running a SELECT * query. Now, users can easily select all columns while renaming a subset:

1select
2  *
3  rename (column_3 as column_3_renamed, column_5 as column_5_renamed)
4from table
5where id = 5

This is much better than having to list out every field just to rename one or two:

1-- old method 👎
2select
3  column_1,
4  column_2,
5  column_3 as column_3_renamed,
6  column_4,
7  column_5 as column_5_renamed,
8  column_6,
9  ...
10  column_12
11from table
12where id = 5

Similar to EXCLUDE, this can also be done for a single column, brackets optional:

1select
2  *
3  rename (column_3 as column_3_renamed)
4from table
5where id = 5

Combining EXCLUDE and RENAME

EXCLUDE and RENAME can easily be combined, as shown in the example below:

1select
2  *
3  exclude(column_10, column_11)
4  rename column_3 as column_3_renamed
5from table
6where id = 5

Much better than the code from the beginning of the blog post:

1select
2  column_1,
3  column_2,
4  column_3 as column_3_renamed,
5  column_4,
6  column_5,
7  column_6,
8  column_7,
9  column_8,
10  column_9, -- columns 10 and 11 are intentionally left out
11  column_12
12from table
13where id = 5

Notes

¹To give you a sense of demand, this stack overflow question, asked almost 14 years ago, has 1.3 million views and over 1000 upvotes ↩

²Shoutout to Nate Sooter for motivating this example with his recent tweet! ↩