Setup Guide
Snowflake Metadata Access
As outlined in our security documentation, SELECT requires a new user to be created with read-only access to your Snowflake account metadata database. In the sections below, we'll walk you through the access commands step by step. Alternatively, you can skip ahead to the full script and run it in one go.
Ensure you have access to run these commands!
In order to run this setup script, your user will need to have access to the accountadmin
role. If you're not sure whether you have this, you can run show grants to user <your_username>
to check.
Step 1: Create new user & role
First off, create a new user and role that will only be used by SELECT. You'll need to update the PASSWORD_HERE
with a randomly generated password
1. Alternatively, configure key-pair authentication. Be sure to save the credentials, since you'll need to enter it into the SELECT web application later on.
use role accountadmin;
create user if not exists select_user
password = 'PASSWORD_HERE'
default_role = select_user
default_warehouse = select_user
comment = 'Used by select.dev';
create role if not exists select_user comment = 'Used by select.dev';
grant role select_user to user select_user;
Step 2: Create a new warehouse
Next, create a new warehouse for SELECT. SELECT will use this warehouse to query your Snowflake metadata once per day. The cost of running this warehouse will be minimal. If preferred, you can instead grant us access to run the queries from an existing warehouse 2.
use role accountadmin;
create warehouse if not exists select_user
warehouse_size=xsmall
auto_suspend=60
initially_suspended=true
comment = 'Used by select.dev';
grant monitor, operate, usage, modify
on warehouse select_user to role select_user;
Step 3: Create a storage integration
To export your account's metadata in the fastest and most cost-efficient way, we export directly to a secure Google Cloud Storage (GCS) bucket. The commands below create a storage integration and allows the select_user
role to write the Snowflake metadata to SELECT's GCS bucket using it.
use role accountadmin;
create storage integration if not exists select_export
type = external_stage
storage_provider = 'GCS' -- leave this as GCS, even if your account is AWS/Azure
enabled = true
storage_allowed_locations = ('*');
grant usage on integration select_export to role select_user;
If you'd like to restrict the storage_allowed_locations
to just the required bucket, see the tooltip on the add account form for the required SQL for your account. Optionally, a named stage can be used if you've set PREVENT_UNLOAD_TO_INLINE_URL=true
, the SQL for which is again auto-generated on the add account form.
Google Cloud Storage is used as an intermediary between SELECT's Snowflake account and the customer's. This is done because the Snowflake account usage views used by the product cannot be data shared directly, and the customer accounts may be in different regions which would require setting up data replication. Using Google Cloud Storage makes setup a lot simpler and more cost-effective for the customer.
As per our security documentation, each customer's Snowflake metadata is stored in a dedicated Google Cloud Storage bucket. Data is stored in the us-east-1 Google Cloud region. There is a data egress cost charged by Snowflake, but it's incredibly small for the metadata volumes we're processing (the highest we've seen across our customer base is $20 per year).
Step 4: Grant user read only access to Snowflake metadata
Next, we'll grant select_user
read only access to the Snowflake metadata database. We also give it permission to run show warehouses
commands.
-- Allows SELECT to query Snowflake metadata
grant imported privileges on database snowflake to role select_user;
-- Allows SELECT to monitor warehouses
grant monitor usage on account to role select_user;
Step 5: Ensure your account has ORGADMIN access
SELECT relies on a few views from the snowflake.organization_usage
schema. For example, snowflake.organization_usage.rate_sheet_daily
is used to look up your daily rate for each Snowflake service (i.e. your cost per credit for compute, cost per terabyte for storage, etc).
Please run the following SQL statement to check if your account has these views populated:
use role accountadmin;
select *
from snowflake.organization_usage.rate_sheet_daily;
If records are returned, you're all set and can move on to the next step.
If no records are returned, the ORGADMIN role likely isn't enabled for the account. To enable it, log into a Snowflake account in your organization which has ORGADMIN enabled (at least one account will), then head to Admin -> Accounts in the sidebar. Then, enable ORGADMIN for the account you're connecting to SELECT.
It can take up to 1 day for these views become populated with data after running this statement. As a result, you may not see data in your Snowflake account until the morning after connecting your account. Reach out to our team using the Intercom chat on the website if you're still not seeing data after 1 day.
Purchased your Snowflake credits through a reseller?
Users who purchased their Snowflake contract/credits through a reseller (i.e. AWS Marketplace) will not have access to the organization_usage
views even after running the command above. Please share a copy of your latest Snowflake invoice with the SELECT team (via chat or by emailing [email protected]) and we will manually input the applicable rates for your account.
Step 6: Grant user ability to monitor warehouses
Lastly, we give select_user
the ability to monitor warehouse settings through the manage warehouses permission. This permission is also required for SELECT to retrieve query statistics using get_query_operator_stats.
grant manage warehouses on account to role select_user;
Step 7: Create a stored procedure for describing objects and fetching clustering information
Some features in SELECT require information about objects in your account by:
- Running a describe table statement in your Snowflake account to fetch additional metadata from columns in a particular object (table, view, materialized table, etc).
- Running Snowflake's system function system$clustering_information to fetch metadata about the clustering health status of columns in a table.
Instead of granting the necessary permissions to the select_user
on all your existing objects and using future grants to also include new objects (which can conflict with your existing future grants), we ask our customers to create a custom stored procedure for when SELECT needs to run the commands above.
You can create the stored procedure in any schema and with any name, but it needs to be created with the accountadmin
role, as SELECT relies on the execute as owner
property. Take note of which database name, schema name, and stored procedure name you chose, as you will need it in further steps. The stored procedure only describes objects and retrieves clustering information. It does not access any data from any tables.
-- If you want to isolate the stored procedure on its own database & schema, you can create a new one
create database if not exists select_utils;
grant usage on database select_utils to role select_user;
create schema if not exists select_utils.utils;
grant usage on schema select_utils.utils to role select_user;
create or replace procedure select_utils.utils.sproc_describe_object (
object_type varchar, -- Generic, so SELECT can describe tables, dynamic tables and materialized views
object_fqn varchar, -- The full path of the object to be described (database_name.schema_name.object_name)
run_describe_object boolean, -- Whether to call "describe table" or not
add_clustering_information boolean, -- Whether to include clustering information in the results or not
columns_filter_clustering_information array -- Sometimes SELECT only needs clustering information on certain columns
)
returns variant
language javascript
execute as owner
as $$
/**
* Function to call "describe table" and parse the results.
*/
function describeObject(objectType, objectFqn) {
const sql = `describe ${objectType} ${objectFqn}`
const statement = snowflake.createStatement({ sqlText: sql })
const resultSet = statement.execute()
const columnsInResult = []
for (let columnIndex = 0; columnIndex < statement.getColumnCount(); columnIndex++) {
columnsInResult.push(statement.getColumnName(columnIndex + 1)) // Starts from 1
}
const output = [];
for (let rowIndex = 0; resultSet.next(); rowIndex++) {
let row = { _index: rowIndex }
columnsInResult.forEach(column => row[column] = resultSet.getColumnValue(column))
output.push(row)
}
return output
}
/**
* Function to call "system$clustering_information" and parse the results.
* Snowflake requires one call per column, but we combine all calls into one SQL statement
* using UNION ALL.
*/
function extendColumnsWithClusteringInformation(tableFqn, columns, runDescribeObject, columnsFilterClusteringInformation) {
let filteredColumns
if (runDescribeObject) { // If describe object has been called
// Filter out unsupported column types (causes errors on Snowflake)
filteredColumns = columns.filter(column => !['OBJECT', 'VARIANT', 'ARRAY'].includes(column.type))
// If a list of columns to filter on is provided, use it
if (columnsFilterClusteringInformation)
filteredColumns = filteredColumns.filter(column => columnsFilterClusteringInformation.includes(column.name))
} else { // If describe object hasn't been called, use the columns provided in the filter
// Convert it to the expected format
filteredColumns = columnsFilterClusteringInformation.map(columnName => ({ name: columnName }))
}
if (!filteredColumns.length) return
const sql = filteredColumns.map(column =>
`select ${column?._index || 0} as column_index, '${column.name}' as column_name, system$clustering_information('${tableFqn}', '("${column.name}")') as clustering_information`
).join("\nunion all\n")
const resultSet = snowflake.execute({ sqlText: sql })
while (resultSet.next()) {
let colIndex = resultSet.getColumnValue(1)
let clusteringInformation = JSON.parse(resultSet.getColumnValue(3))
if (runDescribeObject) { // If there's already data available from describe object, extend it
columns[colIndex] = { ...columns[colIndex], clustering_information: clusteringInformation }
} else { // Otherwise, append to the array
columns.push({ clustering_information: clusteringInformation })
}
}
}
/**
* Function to call "system$clustering_information" without any expression, which defaults
* to the existing clustering key if it's defined in the table.
*/
function fetchClusteringKeyInformation(tableFqn) {
try {
const resultSet = snowflake.execute({ sqlText: `select system$clustering_information('${tableFqn}')` })
resultSet.next()
return JSON.parse(resultSet.getColumnValue(1))
} catch (err) {
// "Invalid clustering keys or table ... is not clustered"
return null;
}
}
let columns = null
let clusteringKeyClusteringInformation = null
if (RUN_DESCRIBE_OBJECT) {
columns = describeObject(OBJECT_TYPE, OBJECT_FQN)
// If the ADD_CLUSTERING_INFORMATION argument is true, append the output with
// the results from system$clustering_information
if (ADD_CLUSTERING_INFORMATION) {
extendColumnsWithClusteringInformation(OBJECT_FQN, columns, true, COLUMNS_FILTER_CLUSTERING_INFORMATION)
clusteringKeyClusteringInformation = fetchClusteringKeyInformation(OBJECT_FQN)
}
} else if (ADD_CLUSTERING_INFORMATION && COLUMNS_FILTER_CLUSTERING_INFORMATION) {
// If only one column is passed as a filter and it is an empty string, fetch clustering data about
// the clustering key
if (COLUMNS_FILTER_CLUSTERING_INFORMATION?.length === 1 && COLUMNS_FILTER_CLUSTERING_INFORMATION[0] === '') {
clusteringKeyClusteringInformation = fetchClusteringKeyInformation(OBJECT_FQN)
} else {
columns = []
extendColumnsWithClusteringInformation(OBJECT_FQN, columns, false, COLUMNS_FILTER_CLUSTERING_INFORMATION)
}
}
return { clusteringKey: clusteringKeyClusteringInformation, columns: columns, version: 'v1' }
$$;
grant usage on procedure select_utils.utils.sproc_describe_object(varchar, varchar, boolean, boolean, array) to role select_user;
The stored procedure can only access metadata about columns in your tables, and it can never access any of your data, as shown by the output of the procedure which only returns table metadata:
return { clusteringKey: clusteringKeyClusteringInformation, columns: columns, version: 'v1' }
If you want to validate the data returned from stored procedure yourself, you can call it by running:
-- Pass the full path of a table on the second argument
call select_utils.utils.sproc_describe_object('table', 'snowflake_sample_data.tpch_sf1.customer', true, true, null);
Full Script
After updating the PASSWORD_HERE
in the second command, you can run the entire script at once.
-- Step 1: Create new user
use role accountadmin;
create user if not exists select_user
password = 'PASSWORD_HERE'
default_role = select_user
default_warehouse = select_user
comment = 'Used by select.dev';
create role if not exists select_user comment = 'Used by select.dev';
grant role select_user to user select_user;
-- Step 2: Create a new X-SMALL warehouse for SELECT to use when reading Snowflake metadata
create warehouse if not exists select_user
warehouse_size=xsmall
auto_suspend=60
initially_suspended=true
comment = 'Used by select.dev';
grant usage on warehouse select_user to role select_user;
-- Step 3: Create a storage integration in GCS so SELECT can export metadata required to power cost dashboards
-- Used to power dashboards in select.dev
create storage integration if not exists select_export
type = external_stage
storage_provider = 'GCS' -- leave this as GCS, even if your account is AWS/Azure
enabled = true
storage_allowed_locations = ('*');
grant usage on integration select_export to role select_user;
-- Step 4: Grant SELECT_USER required privileges
-- SELECT will get read only access to Snowflake account metadata database:
-- https://docs.snowflake.com/en/sql-reference/snowflake-db.html
-- Allows SELECT to query Snowflake metadata
grant imported privileges on database snowflake to role select_user;
-- Allows SELECT to monitor warehouses
grant monitor usage on account to role select_user;
-- Step 6: Give SELECT the ability to modify each warehouse in your account and retrieve query statistics
-- through get_query_operator_stats
grant manage warehouses on account to role select_user;
-- Step 7: Create a stored procedure for describing objects and fetching clustering information
create database if not exists select_utils;
grant usage on database select_utils to role select_user;
create schema if not exists select_utils.utils;
grant usage on schema select_utils.utils to role select_user;
create or replace procedure select_utils.utils.sproc_describe_object (
object_type varchar, -- Generic, so SELECT can describe tables, dynamic tables and materialized views
object_fqn varchar, -- The full path of the object to be described (database_name.schema_name.object_name)
run_describe_object boolean, -- Whether to call "describe table" or not
add_clustering_information boolean, -- Whether to include clustering information in the results or not
columns_filter_clustering_information array -- Sometimes SELECT only needs clustering information on certain columns
)
returns variant
language javascript
execute as owner
as $$
/**
* Function to call "describe table" and parse the results.
*/
function describeObject(objectType, objectFqn) {
const sql = `describe ${objectType} ${objectFqn}`
const statement = snowflake.createStatement({ sqlText: sql })
const resultSet = statement.execute()
const columnsInResult = []
for (let columnIndex = 0; columnIndex < statement.getColumnCount(); columnIndex++) {
columnsInResult.push(statement.getColumnName(columnIndex + 1)) // Starts from 1
}
const output = [];
for (let rowIndex = 0; resultSet.next(); rowIndex++) {
let row = { _index: rowIndex }
columnsInResult.forEach(column => row[column] = resultSet.getColumnValue(column))
output.push(row)
}
return output
}
/**
* Function to call "system$clustering_information" and parse the results.
* Snowflake requires one call per column, but we combine all calls into one SQL statement
* using UNION ALL.
*/
function extendColumnsWithClusteringInformation(tableFqn, columns, runDescribeObject, columnsFilterClusteringInformation) {
let filteredColumns
if (runDescribeObject) { // If describe object has been called
// Filter out unsupported column types (causes errors on Snowflake)
filteredColumns = columns.filter(column => !['OBJECT', 'VARIANT', 'ARRAY'].includes(column.type))
// If a list of columns to filter on is provided, use it
if (columnsFilterClusteringInformation)
filteredColumns = filteredColumns.filter(column => columnsFilterClusteringInformation.includes(column.name))
} else { // If describe object hasn't been called, use the columns provided in the filter
// Convert it to the expected format
filteredColumns = columnsFilterClusteringInformation.map(columnName => ({ name: columnName }))
}
if (!filteredColumns.length) return
const sql = filteredColumns.map(column =>
`select ${column?._index || 0} as column_index, '${column.name}' as column_name, system$clustering_information('${tableFqn}', '("${column.name}")') as clustering_information`
).join("\nunion all\n")
const resultSet = snowflake.execute({ sqlText: sql })
while (resultSet.next()) {
let colIndex = resultSet.getColumnValue(1)
let clusteringInformation = JSON.parse(resultSet.getColumnValue(3))
if (runDescribeObject) { // If there's already data available from describe object, extend it
columns[colIndex] = { ...columns[colIndex], clustering_information: clusteringInformation }
} else { // Otherwise, append to the array
columns.push({ clustering_information: clusteringInformation })
}
}
}
/**
* Function to call "system$clustering_information" without any expression, which defaults
* to the existing clustering key if it's defined in the table.
*/
function fetchClusteringKeyInformation(tableFqn) {
try {
const resultSet = snowflake.execute({ sqlText: `select system$clustering_information('${tableFqn}')` })
resultSet.next()
return JSON.parse(resultSet.getColumnValue(1))
} catch (err) {
// "Invalid clustering keys or table ... is not clustered"
return null;
}
}
let columns = null
let clusteringKeyClusteringInformation = null
if (RUN_DESCRIBE_OBJECT) {
columns = describeObject(OBJECT_TYPE, OBJECT_FQN)
// If the ADD_CLUSTERING_INFORMATION argument is true, append the output with
// the results from system$clustering_information
if (ADD_CLUSTERING_INFORMATION) {
extendColumnsWithClusteringInformation(OBJECT_FQN, columns, true, COLUMNS_FILTER_CLUSTERING_INFORMATION)
clusteringKeyClusteringInformation = fetchClusteringKeyInformation(OBJECT_FQN)
}
} else if (ADD_CLUSTERING_INFORMATION && COLUMNS_FILTER_CLUSTERING_INFORMATION) {
// If only one column is passed as a filter and it is an empty string, fetch clustering data about
// the clustering key
if (COLUMNS_FILTER_CLUSTERING_INFORMATION?.length === 1 && COLUMNS_FILTER_CLUSTERING_INFORMATION[0] === '') {
clusteringKeyClusteringInformation = fetchClusteringKeyInformation(OBJECT_FQN)
} else {
columns = []
extendColumnsWithClusteringInformation(OBJECT_FQN, columns, false, COLUMNS_FILTER_CLUSTERING_INFORMATION)
}
}
return { clusteringKey: clusteringKeyClusteringInformation, columns: columns, version: 'v1' }
$$;
grant usage on procedure select_utils.utils.sproc_describe_object(varchar, varchar, boolean, boolean, array) to role select_user;
[OPTIONAL] Step 8 - Update Network Policies
If your Snowflake account has configured a network policy to restrict access based on IP address, you'll need to update the network policy to allow SELECT to connect to your Snowflake account.
If you're using Snowsight, head to Admin » Security » Network Policies. If you're using the Classic Web Interface, head to Account » Policies. From there, you can add our IP address, 34.23.79.180
, to your "Allowed IP Addresses".
Updating an account wide network policy
If you are updating a network policy that is currently active for all users in your Snowflake account, no action is required once you've added the IP address to the allowed list.
Creating a new network policy
If you would instead prefer to create a new network policy containing our IP address, you must also activate that network policy for the SELECT user by running a command like:
alter user select_user set network_policy = your_new_network_policy_name
Additional resources
See the Snowflake documentation for more details, or reach out to using the chat button on the bottom right if you run into any issues.
Next Steps
Head to our instructions on connecting your Snowflake account and add the details for the new Snowflake user you just created.
Notes
1 We recommend using a tool like 1Password to both randomly generate and save the password. Alternatively, you can use a password generator website and store the password securely elsewhere. ↩
2 If you'd prefer to have SELECT run the metadata extracts from an existing warehouse, you can run the commands below instead:
use role accountadmin;
grant monitor, operate, usage on warehouse YOUR_PREFERED_WAREHOUSE_NAME to role select_user;
alter user select_user set default_warehouse=YOUR_PREFERED_WAREHOUSE_NAME