Security Model and Architecture

Date

SELECT's Security Model

Secure by Design

SELECT only requires read access to a customer's Snowflake account metadata database. This database only includes metadata about how the customer is using Snowflake. No actual customer data or sensitive information is stored in this database. Some examples of this information include, but are not limited to:

  • The number of tables in each database, and the size of each table
  • The amount the customer was billed by Snowflake on a particular day
  • How frequently Snowflake's automatic clustering service is running
  • Metadata about the queries being run in an account (query runtime, tables accessed, etc.)

This data is stored in SELECT's own Snowflake account where insights can be derived and presented to our customers. We follow a principle of least privilege, and only extract the minimum subset of metadata required for SELECT's services.

Zero access to any customer data.

We do not have read or write access to any of the customer's data that is stored in Snowflake. This access is tightly controlled during the onboarding process where customers create a new user for SELECT with an extremely limited set of permissions.

SELECT Architecture

The security features discussed above can be further visualized in the diagram below outlining SELECT's secure & limited data access architecture.

SELECT system architecture with limited access

Data Flow

  1. The customer creates a new Snowflake service user with key-pair authentication and read-only access to their Snowflake metadata database. This user can only access the Snowflake metadata database and does not have any access whatsoever to customer data.
  2. Network policies can be applied to restrict this user to only accept connections from SELECT's fixed IP address (34.23.79.180).
  3. Metadata is extracted and loaded to an isolated GCP bucket dedicated to each customer.
  4. Data is stored in an isolated Snowflake schema specific to each customer, ensuring strict data segregation.
  5. Insights are derived from this dataset and presented to the customer in SELECT's web application.

User Access Flow

  1. Access to SELECT is governed by the customer's SSO system (Okta, Entra ID, etc.) if configured, or the default Google SSO and Username + Password authentication options. MFA can also be enabled.
  2. The user's access within SELECT is governed by roles, which can also be configured based on custom SSO Team memberships.
  3. SELECT uses secure logical separation to isolate tenant data. Users only have access to the data and insights from their own organization's Snowflake metadata. Access and allowed actions are further restricted by the user's configured roles.

See full details in the user management and RBAC documentation.

Query Text Sanitization

While rare, we recognize it is possible that Snowflake users within a customer's company may inadvertently include sensitive information in their query text. For example, an engineer may be debugging an issue and query all notifications sent to a particular user. They may even store some notes to themselves in the query comments:

/*
Customers to investigate:
1. Joe Smith - [email protected] - 123456789
2. Steve Jones - [email protected] - 987654321
*/
select
notification_id,
date_sent
from notifications_sent
where
email = '[email protected]'
or phone_number = 123456789 -- 987654321

SELECT is designed for this worst case scenario and can strip out any literal values and sensitive comments before storing the metadata in our account. This functionality can be enabled upon request. Using the same query example from above, we would only store the following query_text in our database:

select
notification_id,
date_sent
from notifications_sent
where
email = $1
or phone_number = $2

Similar scrubbing can be performed across any free-form text fields ingested from the customer's Snowflake account metadata database into SELECT's database.

What Snowflake metadata do we access?

SELECT accesses Snowflake usage metadata to present users with insights and recommendations related to cost & performance optimization. More information on the exact views we access and their purpose is provided below.

Account Usage

The following views from the account_usage schema are accessed. All views contain metadata about the customer's Snowflake usage. Examples include performance statistics about historical queries run, billing amounts for different Snowflake services, and performance data for virtual warehouses. Please refer to the Snowflake documentation for each view if additional information is required. The account usage views accessed are required to present customers with comprehensive cost and performance insights.

  • snowflake.account_usage.query_history
  • snowflake.account_usage.warehouse_events_history
  • snowflake.account_usage.warehouse_load_history
  • snowflake.account_usage.warehouse_metering_history
  • snowflake.account_usage.stage_storage_usage_history
  • snowflake.account_usage.database_storage_usage_history
  • snowflake.account_usage.storage_usage
  • snowflake.account_usage.metering_daily_history
  • snowflake.account_usage.metering_history
  • snowflake.account_usage.task_history
  • snowflake.account_usage.task_versions
  • snowflake.account_usage.serverless_task_history
  • snowflake.account_usage.automatic_clustering_history
  • snowflake.account_usage.materialized_view_refresh_history
  • snowflake.account_usage.pipe_usage_history
  • snowflake.account_usage.query_acceleration_history
  • snowflake.account_usage.search_optimization_history
  • snowflake.account_usage.replication_usage_history
  • snowflake.account_usage.access_history
  • snowflake.account_usage.tables
  • snowflake.account_usage.table_storage_metrics

Organization Usage

The following views from the organization_usage schema are accessed:

  • snowflake.organization_usage.contract_items: Contains information about a customer's current Snowflake contract. We use this to help provide users with budgeting forecasts.
  • snowflake.organization_usage.remaining_balance_daily: Contains information about a customer's remaining contract balance. Required to determine the effective rates to apply when calculating costs and for budget forecasting.
  • snowflake.organization_usage.rate_sheet_daily: Contains information about the effective rates applied on each day. Required to calculate spend data.
  • snowflake.organization_usage.usage_in_currency_daily: Contains information about how much a customer is being billed each day. Required to provide customers with Snowflake spend analytics.