Top Gradient

Connect your Databricks Workspace

Prerequisites

Connecting the workspace

Manual Setup

Step 1: Get account and workspace identifiers

Step 2: Create a service principal

  • In the workspace, select Settings from the top right Profile icon
  • Under Workspace admin, select Identity and access
  • Next to Service Principals, click Manage
  • Click Add service principal
    • An Add new service principal modal pops up, click Add new
    • Give the service principal a memorable name (e.g. select-service-principal)
  • Open the newly created service principal, select Secrets tab, then click Generate secret
  • Enter lifetime of 730 days, then click Generate
    • you will need to create a new secret and update it in SELECT when it expires
  • Note down your Secret and Client ID

Step 3: Create a Serverless SQL Warehouse

  • Navigate to SQLSQL Warehouse from the navigation menu
  • Click Create SQL warehouse with the following config
    • Name: pick a memorable name (e.g. select-sql-warehouse)
    • Cluster size: 2X-Small
    • Type: Serverless
    • Auto-stop: set to 5 minutes to minimize idle costs (can be reduced to 1 minute via the Databricks Warehouses API after creation — the UI minimum is 5 minutes)
  • Click Create
  • After the warehouse is created, the Manage permissions modal window is shown
    • You can also access this modal by clicking Permissions
  • Search for and select the service principal you created (using the name or client id).
  • Select Can Use permission, click Add and close the modal
  • Note down the Warehouse ID

Step 4: Grant Read Access to system catalog tables

Grant the service principal the exact privileges SELECT needs:

  • USE CATALOG on the system catalog
  • USE SCHEMA on each schema we read from
  • SELECT on each table we read from

Open a SQL editor in your Databricks workspace (attached to the warehouse created in Step 3), find/replace <service-principal-client-id> with the **Client ID** noted in Step 2, then run:

1-- Catalog
2grant use catalog on catalog system to `<service-principal-client-id>`;
3
4-- Schemas
5grant use schema on schema system.query to `<service-principal-client-id>`;
6grant use schema on schema system.billing to `<service-principal-client-id>`;
7grant use schema on schema system.compute to `<service-principal-client-id>`;
8grant use schema on schema system.access to `<service-principal-client-id>`;
9grant use schema on schema system.lakeflow to `<service-principal-client-id>`;
10
11-- Tables
12grant select on table system.query.history to `<service-principal-client-id>`;
13
14grant select on table system.billing.usage to `<service-principal-client-id>`;
15grant select on table system.billing.account_prices to `<service-principal-client-id>`; -- skip on Azure
16grant select on table system.billing.list_prices to `<service-principal-client-id>`;
17
18grant select on table system.compute.warehouses to `<service-principal-client-id>`;
19grant select on table system.compute.warehouse_events to `<service-principal-client-id>`;
20grant select on table system.compute.clusters to `<service-principal-client-id>`;
21grant select on table system.compute.node_timeline to `<service-principal-client-id>`;
22
23grant select on table system.access.table_lineage to `<service-principal-client-id>`;
24grant select on table system.access.column_lineage to `<service-principal-client-id>`;
25grant select on table system.access.workspaces_latest to `<service-principal-client-id>`;
26
27grant select on table system.lakeflow.job_run_timeline to `<service-principal-client-id>`;
28grant select on table system.lakeflow.job_task_run_timeline to `<service-principal-client-id>`;
29grant select on table system.lakeflow.jobs to `<service-principal-client-id>`;
30grant select on table system.lakeflow.job_tasks to `<service-principal-client-id>`;
31grant select on table system.lakeflow.pipelines to `<service-principal-client-id>`;

Step 5: Allow SELECT IP Addresses

If your Databricks workspace has configured a network policy to restrict access based on IP address, you'll need to update the network policy to allow SELECT to connect. Refer to the following documentation for more information:

134.23.79.180,
234.139.189.198,
335.229.105.183,
435.243.245.125,
534.138.184.161,
634.73.135.174,
734.148.143.80,
834.23.67.112,

The following python code can be run in a notebook if you would like to configure this at a workspace level:

1from databricks.sdk import WorkspaceClient
2from databricks.sdk.service.settings import ListType
3
4w = WorkspaceClient()
5
6# Create a new IP allow list
7created_list = w.ip_access_lists.create(
8 label="SELECT-allowed-ips",
9 list_type=ListType.ALLOW,
10 ip_addresses=[
11 "34.23.79.180",
12 "34.139.189.198",
13 "35.229.105.183",
14 "35.243.245.125",
15 "34.138.184.161",
16 "34.73.135.174",
17 "34.148.143.80",
18 "34.23.67.112",
19 ],
20)

Step 6: Enabling Automated Savings

Automated Savings features for Databricks are in private preview. Please contact SELECT to request access.

The Automated Savings feature for All-Purpose Compute, Jobs Compute and SQL Warehouses requires granting the CAN_MANAGE permission to the SELECT Service Principal on each of the relevant resources. For Jobs Compute, the permissions should be granted on the Job itself rather than the compute resources which it creates. Repeat this for all Clusters, Warehouses and Jobs in your workspace.

All-Purpose Compute

Editing permissions on All-purpose Compute

Jobs

Editing permissions on Jobs

SQL Warehouses

Editing permissions on SQL Warehouses

If, as part of evaluating an All-Purpose Compute cluster for potential savings, granting the CAN_ATTACH_TO permission allows access to Spark metrics and better estimates of savings potential, before full management access is granted to enable the feature. The SELECT Automated Savings agent will begin monitoring all clusters automatically where this permission is granted.

Automated setup

Running the following script will do all of the manual steps above (apart from creating the secret). This script provisions a Databricks service principal for select.dev by creating/finding the SP, assigning it to the target workspace, granting system catalog access, and setting up a SQL warehouse. If USE_AUTOMATED_SAVINGS is enabled, it also iterates all workspaces in the same region and grants the SP CAN_MANAGE permissions on all jobs, clusters, and warehouses to enable automated savings features. If IP_ALLOW_LIST_ENABLED is True, it will create an allowed list of IPs for all the workspaces as well. Copy and save to a file locally (e.g. setup-select.py), edit the first 10 lines with your details, and then just run python setup-select.py:

1# Dependencies: pip install databricks-sdk
2
3# ── Config — edit these before running ───────────────────────────────────────
4
5WORKSPACE_URL = "<https://dbc-xxxxx.cloud.databricks.com>"
6ACCOUNT_ID = "<databricks-account-id>"
7SERVICE_PRINCIPAL_NAME = "<service-principal-name>"
8WAREHOUSE_NAME = "<sql-warehouse-name>"
9IP_ALLOW_LIST_ENABLED = False
10IP_ALLOW_LIST_LABEL = "select-ip-allow-list"
11USE_AUTOMATED_SAVINGS = True

Add workspace to SELECT

  • Navigate to settings: https://select.dev/app/settings
  • Go to the Databricks tab and click the Add Workspace button
  • Enter a memorable workspace name, and fill out all the details you have noted in the previous steps:
    • Account ID
    • Workspace URL
    • Warehouse ID
    • Client ID
    • Client Secret
  • Click Add and you are good to go!

Get up and running in less than 15 minutes

Connect your Snowflake account and instantly understand your savings potential.

CTA Screen