Top Gradient

Customer-Side Metadata Sanitization

Overview

Some customers may wish to sanitize inputs such as query text. This document gives various options for how to achieve this. Only enable this if required for compliance, as it makes queries harder to read.

Option 1: Query text sanitization on SELECT side

The simplest approach is to toggle on "Query Sanitization." Upon ingesting your metadata into SELECT's systems, this setting will strip string literals, numbers, and single-line comments from query text (gist demonstrating this code).

Option 2: Sanitize query text yourself

You can also sanitize query text yourself before your data reaches SELECT's systems. You can deviate from the SELECT setup script by withholding the GOVERNANCE_VIEWER permission, denying us direct access to Snowflake's ACCESS_HISTORY and QUERY_HISTORY views.

Skip running this suggestion:

1GRANT IMPORTED PRIVILEGES ON DATABASE SNOWFLAKE TO ROLE identifier($select_role);

And instead grant the permissions needed to read other views that SELECT needs:

1GRANT DATABASE ROLE SNOWFLAKE.USAGE_VIEWER TO ROLE identifier($select_role);
2GRANT DATABASE ROLE SNOWFLAKE.OBJECT_VIEWER TO ROLE identifier($select_role);
3GRANT DATABASE ROLE SNOWFLAKE.SECURITY_VIEWER TO ROLE identifier($select_role);
4GRANT DATABASE ROLE SNOWFLAKE.ORGANIZATION_USAGE_VIEWER TO ROLE identifier($select_role);
5GRANT DATABASE ROLE SNOWFLAKE.ORGANIZATION_BILLING_VIEWER TO ROLE identifier($select_role);
6GRANT DATABASE ROLE SNOWFLAKE.ORGANIZATION_ACCOUNTS_VIEWER TO ROLE identifier($select_role);

Next, you need to provide the two missing views, with your customized sanitization. An example is below, which creates a database and schema, applies the query sanitization that SELECT would have run under Option 1 (which you can further customize), and grants SELECT access:

1GRANT USAGE ON DATABASE DEMO_SELECT_INGEST TO ROLE identifier($select_role);
2GRANT USAGE ON SCHEMA DEMO_SELECT_INGEST.SANITIZED TO ROLE identifier($select_role);
3GRANT SELECT ON ALL VIEWS IN SCHEMA DEMO_SELECT_INGEST.SANITIZED TO ROLE identifier($select_role);
4GRANT SELECT ON FUTURE VIEWS IN SCHEMA DEMO_SELECT_INGEST.SANITIZED TO ROLE identifier($select_role);
5
6CREATE OR REPLACE VIEW DEMO_SELECT_INGEST.SANITIZED.QUERY_HISTORY AS
7SELECT
8 QUERY_ID,
9 concat(
10 regexp_replace(
11 regexp_replace(
12 regexp_replace(
13 query_text,
14 ';'
15 ),

Now, under the "Advanced Settings" section of the Snowflake connection page, toggle on "Custom Account Usage Views", choose "Only query_history and access_history", and provide the database+schema name you just created (in the example above, DEMO_SELECT_INGEST.SANITIZED)

Option 3: All required account_usage views

Use this option if you want full control over all account_usage views that SELECT ingests. You'll create a stored procedure that builds pass-through views for every required view in a custom database, then manually customize any views where you need sanitization.

NOTE: SELECT may need to ingest more tables and columns over time as Snowflake introduces new features. This setup should help with the initial integration, but ongoing maintenance may be needed to keep the data ingestion healthy.

Skip this permission grant from the standard setup:

1GRANT IMPORTED PRIVILEGES ON DATABASE SNOWFLAKE TO ROLE identifier($select_role);

SELECT still needs direct access to organization_usage views. Grant the following database roles:

1SET select_role = 'SELECT_USER'; -- replace with your SELECT role name
2
3GRANT DATABASE ROLE SNOWFLAKE.ORGANIZATION_USAGE_VIEWER TO ROLE identifier($select_role);
4GRANT DATABASE ROLE SNOWFLAKE.ORGANIZATION_BILLING_VIEWER TO ROLE identifier($select_role);
5GRANT DATABASE ROLE SNOWFLAKE.ORGANIZATION_ACCOUNTS_VIEWER TO ROLE identifier($select_role);

Next, create the custom database and schema:

1SET select_role = 'SELECT_USER'; -- replace with your SELECT role name
2
3CREATE DATABASE IF NOT EXISTS DEMO_SELECT_INGEST;
4CREATE SCHEMA IF NOT EXISTS DEMO_SELECT_INGEST.SANITIZED;
5
6GRANT USAGE ON DATABASE DEMO_SELECT_INGEST TO ROLE identifier($select_role);
7GRANT USAGE ON SCHEMA DEMO_SELECT_INGEST.SANITIZED TO ROLE identifier($select_role);
8GRANT SELECT ON ALL VIEWS IN SCHEMA DEMO_SELECT_INGEST.SANITIZED TO ROLE identifier($select_role);
9GRANT SELECT ON FUTURE VIEWS IN SCHEMA DEMO_SELECT_INGEST.SANITIZED TO ROLE identifier($select_role);

Next, create and run a stored procedure that creates a pass-through view for every required account_usage view. It queries information_schema to discover all columns and selects them by name (not SELECT *, which would break if Snowflake adds columns later).

1CREATE OR REPLACE PROCEDURE DEMO_SELECT_INGEST.SANITIZED.CREATE_BASE_VIEWS()
2RETURNS STRING
3LANGUAGE JAVASCRIPT
4EXECUTE AS CALLER
5AS
6$$
7var requiredViews = [
8'QUERY_HISTORY',
9'ACCESS_HISTORY',
10'SESSIONS',
11'TABLE_DML_HISTORY',
12'TABLE_PRUNING_HISTORY',
13'WAREHOUSE_METERING_HISTORY',
14'WAREHOUSE_EVENTS_HISTORY',
15'WAREHOUSE_LOAD_HISTORY',

From here, you now have views just passing through all required columns. To sanitize a view, use GET_DDL to retrieve its full definition, then modify it.

For example. to sanitize QUERY_TEXT in QUERY_HISTORY , first get the current view definition:

1SELECT GET_DDL('VIEW', 'DEMO_SELECT_INGEST.SANITIZED.QUERY_HISTORY');

This returns a CREATE OR REPLACE VIEW statement listing every column. Find the "QUERY_TEXT" column in the SELECT list and replace it with the following sanitization expression (or one of your choosing):

1concat(
2regexp_replace(
3regexp_replace(
4regexp_replace(
5"QUERY_TEXT",
6';'
7),
8$$(/\*([^*]|\*[^/])*\*/)|(--.*$)|(--.*[\n|\r])$$,
9'\n'
10),
11$$('([^']|'')*')|(\$\$.*\$\$)|(\b[+-]?[0-9]+([.][0-9]+)?\b)|(\B[.][0-9]+\b)|('([^']|'')*')$$,
12'?'
13),
14'\n',
15array_to_string(

Then run the modified CREATE OR REPLACE VIEW statement.

Finally, when setting up your Snowflake connection in SELECT, toggle on "Custom Account Usage Views" and "All required account_usage views" and then put the database+schema used in the above scripts.

Get up and running in less than 15 minutes

Connect your Snowflake account and instantly understand your savings potential.

CTA Screen