Top Gradient

Connect your BigQuery Organization

Prerequisites

  • You need the following permissions to complete this setup:
    • Billing Account Costs Manager (or Billing Account Administrator) on the Cloud Billing account to enable the detailed usage cost export. Pricing data and CUD metadata exports require Billing Account Administrator.
    • Organization Admin (or another role that includes resourcemanager.organizations.setIamPolicy) to grant org-level IAM roles.
    • Project IAM Admin on the project that will host the billing dataset and service account to create the service account and grant project-level roles.
  • The following APIs enabled on the project that will host the billing dataset and service account:
    • bigquery.googleapis.com
    • iam.googleapis.com
    • iamcredentials.googleapis.com (required for service account impersonation frequent silent-failure cause if missing)
    • cloudresourcemanager.googleapis.com

Connecting your account

The setup has two parts: enabling the billing data exports (Console), and granting SELECT read access to BigQuery and the exported data (gcloud).

For the rest of this guide, we'll use customer-billing-project to refer to the project that will host both the billing export dataset and the service account SELECT uses. Substitute your own project ID.

Manual Setup

Step 1: Enable billing exports

SELECT uses three billing exports to give you complete cost visibility. All three should target the same dataset.

  • In the Google Cloud Console, navigate to Billing. You will be redirected to the Cloud Billing account page.
  • In the left pane, click Billing export.

1a. Detailed usage cost export

  • On the BigQuery export tab, find Detailed usage cost and click Edit settings.
  • For Project, select customer-billing-project.
  • For Dataset, click Create new dataset:
    • Enter a Dataset ID (e.g. billing_export). Note this down — you'll need it later.
    • Choose a Multi-region location (US or EU). Multi-region locations backfill data from the start of the previous month; single-region locations do not backfill, so you'll only see data from the day you enable the export forward. This choice is permanent.
    • Use a dataset dedicated to billing export SELECT will receive read access to the entire dataset.
  • Click Save.

1b. Pricing data export

  • Back on the Billing export page, find Pricing data and click Edit settings.
  • Select the same project and dataset.
  • Click Save.

1c. Committed use discount (CUD) data export

  • Find Committed use discount data and click Edit settings.
  • Select the same project and dataset.
  • Click Save.

It can take up to 24 hours for the first batch of billing data to appear.

Step 2: Create the service account

In Cloud Shell or any environment with gcloud authenticated:

1export CUSTOMER_PROJECT=customer-billing-project
2export ORG_ID=<your-organization-id>
3export BILLING_DATASET=<your-billing-export-dataset> # e.g. billing_export
4
5gcloud iam service-accounts create select-viewer \
6 --display-name="SELECT replicator viewer" \
7 --project=${CUSTOMER_PROJECT}

This creates the service account [email protected]. Note this email down, you'll need it in the final step.

Step 3: Grant project-level roles

Grant the service account permission to run queries and use the Storage Read API on the project that will be billed for SELECT's queries:

1for ROLE in roles/bigquery.jobUser roles/bigquery.readSessionUser; do
2 gcloud projects add-iam-policy-binding ${CUSTOMER_PROJECT} \
3 --member=serviceAccount:select-viewer@${CUSTOMER_PROJECT}.iam.gserviceaccount.com \
4 --role=$ROLE
5done

Step 4: Grant organization-level roles

These roles let SELECT enumerate metadata across the org, jobs, datasets, tables, reservations, capacity commitments, and the project hierarchy. They do not grant access to the data inside any dataset.

  • roles/bigquery.resourceViewer includes bigquery.jobs.listAll, bigquery.reservations.list, and bigquery.capacityCommitments.list — covers INFORMATION_SCHEMA.JOBS_BY_ORGANIZATION, RESERVATIONS, RESERVATIONS_TIMELINE, RESERVATION_CHANGES, and CAPACITY_COMMITMENTS.
  • roles/bigquery.metadataViewer includes bigquery.datasets.get and table listing permissions, covers INFORMATION_SCHEMA.SCHEMATA, TABLES, TABLE_STORAGE, and similar metadata views across the org.
  • roles/browser includes resourcemanager.projects.get and .list lets SELECT enumerate projects under the org (needed to auto-discover reservation administration projects) and resolve project IDs into human-readable names. This is read-only access to the resource hierarchy.
1for ROLE in roles/bigquery.resourceViewer roles/bigquery.metadataViewer roles/browser; do
2 gcloud organizations add-iam-policy-binding ${ORG_ID} \
3 --member=serviceAccount:select-viewer@${CUSTOMER_PROJECT}.iam.gserviceaccount.com \
4 --role=$ROLE
5done

Step 5: Grant read access to the billing dataset

Grant bigquery.dataViewer scoped to only the billing export dataset (not the whole project):

1gcloud alpha bq datasets add-iam-policy-binding \
2 ${BILLING_DATASET} \
3 --project=${CUSTOMER_PROJECT} \
4 --member=serviceAccount:select-viewer@${CUSTOMER_PROJECT}.iam.gserviceaccount.com \
5 --role=roles/bigquery.dataViewer

Step 6: Allow SELECT to impersonate the service account

SELECT's backend service account ([email protected]) authenticates by impersonating the service account you just created. No service account keys are exchanged.

1gcloud iam service-accounts add-iam-policy-binding \
2 select-viewer@${CUSTOMER_PROJECT}.iam.gserviceaccount.com \
3 --member=serviceAccount:[email protected] \
4 --role=roles/iam.serviceAccountTokenCreator \
5 --project=${CUSTOMER_PROJECT}

Automated setup

1#!/usr/bin/env python3
2"""
3SELECT for BigQuery — automated onboarding script.
4
5Idempotent. Re-running is safe; resources are created/granted only when
6missing.
7
8Prerequisites:
9 1. Your billing export is ALREADY configured in the Cloud Console for the
10 dataset named below. Without an existing dataset we can't grant access
11 to it. See:
12 https://cloud.google.com/billing/docs/how-to/export-data-bigquery-setup
13 2. You're authenticated as a principal that has, at minimum:
14 - Service Usage Admin on PROJECT_ID
15 - Project IAM Admin on PROJECT_ID
16 - Organization Admin (or a role with resourcemanager.organizations.
17 setIamPolicy) on ORG_ID
18 Run once locally:
19 gcloud auth application-default login
20
21Install:
22 pip install google-cloud-bigquery google-auth requests
23
24Run:
25 Edit the CONFIG block below, then:
26 python setup-select.py
27"""
28
29# ============================================================
30# CONFIG — edit these values, then run the script.
31# ============================================================
32PROJECT_ID = "customer-billing-project"
33ORG_ID = "123456789012"
34# Either `dataset_name` (the dataset lives in PROJECT_ID) or
35# `other-project.dataset_name` if the billing export lives elsewhere.
36BILLING_DATASET = "billing_export"
37SA_NAME = "select-viewer"
38# region-us, region-eu, region-asia-east1, etc. — whichever region your jobs run in.
39REGION = "region-us"
40# ============================================================
41
42
43import json
44import sys
45import time
46from datetime import datetime
47
48import google.auth
49import google.auth.transport.requests
50import requests
51from google.api_core import exceptions as gax_exc
52from google.auth import impersonated_credentials
53from google.cloud import bigquery
54
55
56SELECT_BACKEND_SA = "[email protected]"
57
58REQUIRED_APIS = [
59 "bigquery.googleapis.com",
60 "iam.googleapis.com",
61 "iamcredentials.googleapis.com",
62 "cloudresourcemanager.googleapis.com",
63 "serviceusage.googleapis.com",
64]
65
66PROJECT_ROLES = [
67 "roles/bigquery.jobUser",
68 "roles/bigquery.readSessionUser",
69]
70
71ORG_ROLES = [
72 "roles/bigquery.resourceViewer",
73 "roles/bigquery.metadataViewer",
74 "roles/browser",
75]
76
77
78# ---------- output helpers ----------
79
80def step(msg: str) -> None:
81 print(f"\n==> {msg}")
82
83
84def ok(msg: str) -> None:
85 print(f" [OK] {msg}")
86
87
88def warn(msg: str) -> None:
89 print(f" [!] {msg}")
90
91
92def fail(msg: str) -> None:
93 print(f" [X] {msg}")
94
95
96# ---------- REST helper ----------
97
98_CREDS = None
99_AUTH_REQ = google.auth.transport.requests.Request()
100
101
102def _creds():
103 global _CREDS
104 if _CREDS is None:
105 _CREDS, _ = google.auth.default(
106 scopes=["https://www.googleapis.com/auth/cloud-platform"],
107 )
108 if not _CREDS.valid:
109 _CREDS.refresh(_AUTH_REQ)
110 return _CREDS
111
112
113def gcp_api(method: str, url: str, body: dict | None = None,
114 allow: tuple[int, ...] = ()) -> dict | None:
115 """REST call against any GCP JSON API. Returns parsed JSON (or None for
116 no-content). `allow` swallows specific HTTP statuses (e.g. 409 ALREADY
117 EXISTS) and returns None for them — used for idempotency."""
118 headers = {
119 "Authorization": f"Bearer {_creds().token}",
120 "Content-Type": "application/json",
121 }
122 resp = requests.request(method, url, headers=headers,
123 data=json.dumps(body) if body is not None else None,
124 timeout=60)
125 if resp.status_code in allow:
126 return None
127 if resp.status_code >= 400:
128 raise RuntimeError(
129 f"{method} {url} -> HTTP {resp.status_code}\n{resp.text}"
130 )
131 return resp.json() if resp.content else None
132
133
134# ---------- input parsing ----------
135
136def parse_billing_dataset() -> tuple[str, str]:
137 if "." in BILLING_DATASET:
138 proj, ds = BILLING_DATASET.split(".", 1)
139 return proj, ds
140 return PROJECT_ID, BILLING_DATASET
141
142
143# ---------- steps ----------
144
145def enable_apis() -> None:
146 step("Step 1/6 — enable required APIs")
147 body = {"serviceIds": REQUIRED_APIS}
148 op = gcp_api(
149 "POST",
150 f"https://serviceusage.googleapis.com/v1/projects/{PROJECT_ID}/services:batchEnable",
151 body=body,
152 )
153 # Long-running op — poll briefly.
154 op_name = op.get("name") if op else None
155 if op_name and not op.get("done"):
156 for _ in range(30):
157 time.sleep(2)
158 resp = gcp_api("GET", f"https://serviceusage.googleapis.com/v1/{op_name}")
159 if resp and resp.get("done"):
160 if "error" in resp:
161 fail(f"batchEnable error: {resp['error']}")
162 return
163 break
164 for api in REQUIRED_APIS:
165 ok(api)
166
167 # If the billing dataset is in a different project, make sure BigQuery is
168 # enabled there too. Best-effort: the customer running onboarding usually
169 # does NOT own the billing dataset's project, so a 403 here is expected
170 # and harmless — bigquery.googleapis.com is essentially always already
171 # enabled in any project that hosts a billing export.
172 bp, _ = parse_billing_dataset()
173 if bp != PROJECT_ID:
174 try:
175 gcp_api(
176 "POST",
177 f"https://serviceusage.googleapis.com/v1/projects/{bp}/services:batchEnable",
178 body={"serviceIds": ["bigquery.googleapis.com"]},
179 )
180 ok(f"bigquery.googleapis.com on {bp} (billing dataset host)")
181 except RuntimeError as e:
182 if "403" in str(e) or "PERMISSION_DENIED" in str(e):
183 warn(f"can't enable APIs on {bp} (no permission). Assuming "
184 "bigquery.googleapis.com is already enabled there.")
185 else:
186 raise
187
188
189def sa_email() -> str:
190 return f"{SA_NAME}@{PROJECT_ID}.iam.gserviceaccount.com"
191
192
193def create_service_account() -> None:
194 step(f"Step 2/6 — service account {sa_email()}")
195 existing = gcp_api(
196 "GET",
197 f"https://iam.googleapis.com/v1/projects/{PROJECT_ID}/serviceAccounts/{sa_email()}",
198 allow=(404,),
199 )
200 if existing:
201 ok("already exists")
202 return
203 gcp_api(
204 "POST",
205 f"https://iam.googleapis.com/v1/projects/{PROJECT_ID}/serviceAccounts",
206 body={
207 "accountId": SA_NAME,
208 "serviceAccount": {"displayName": "SELECT replicator viewer"},
209 },
210 # 409 if it was created between our describe and create — fine.
211 allow=(409,),
212 )
213 ok("created")
214
215
216def _add_iam_member(get_url: str, set_url: str, role: str, member: str) -> bool:
217 """Read-modify-write on a Cloud IAM policy. Returns True if the policy
218 was changed (member added), False if member was already present."""
219 policy = gcp_api("POST", get_url, body={}) or {}
220 bindings = policy.get("bindings", [])
221 for b in bindings:
222 if b.get("role") == role:
223 if member in b.get("members", []):
224 return False
225 b.setdefault("members", []).append(member)
226 break
227 else:
228 bindings.append({"role": role, "members": [member]})
229 policy["bindings"] = bindings
230 gcp_api("POST", set_url, body={"policy": policy})
231 return True
232
233
234def grant_project_iam() -> None:
235 step(f"Step 3/6 — project IAM on {PROJECT_ID}")
236 member = f"serviceAccount:{sa_email()}"
237 get_url = f"https://cloudresourcemanager.googleapis.com/v1/projects/{PROJECT_ID}:getIamPolicy"
238 set_url = f"https://cloudresourcemanager.googleapis.com/v1/projects/{PROJECT_ID}:setIamPolicy"
239 for role in PROJECT_ROLES:
240 changed = _add_iam_member(get_url, set_url, role, member)
241 ok(f"{role} {'granted' if changed else '(already present)'}")
242
243
244def grant_org_iam() -> None:
245 step(f"Step 4/6 — org IAM on organizations/{ORG_ID}")
246 member = f"serviceAccount:{sa_email()}"
247 get_url = f"https://cloudresourcemanager.googleapis.com/v3/organizations/{ORG_ID}:getIamPolicy"
248 set_url = f"https://cloudresourcemanager.googleapis.com/v3/organizations/{ORG_ID}:setIamPolicy"
249 for role in ORG_ROLES:
250 changed = _add_iam_member(get_url, set_url, role, member)
251 ok(f"{role} {'granted' if changed else '(already present)'}")
252
253
254def grant_dataset_iam() -> None:
255 bp, bd = parse_billing_dataset()
256 step(f"Step 5/6 — bigquery.dataViewer on {bp}:{bd}")
257 client = bigquery.Client(project=bp)
258 try:
259 ds = client.get_dataset(bigquery.DatasetReference(bp, bd))
260 except gax_exc.NotFound:
261 fail(f"Dataset {bp}:{bd} does not exist. Configure the billing export "
262 "in the Cloud Console first, then re-run.")
263 sys.exit(1)
264 already = any(
265 e.role == "READER" and e.entity_id == sa_email()
266 for e in ds.access_entries
267 )
268 if already:
269 ok("READER already granted")
270 return
271 ds.access_entries = list(ds.access_entries) + [
272 bigquery.AccessEntry(role="READER", entity_type="userByEmail",
273 entity_id=sa_email()),
274 ]
275 client.update_dataset(ds, ["access_entries"])
276 ok("READER granted")
277
278
279def grant_token_creator() -> None:
280 step(f"Step 6/6 — allow {SELECT_BACKEND_SA} to impersonate {sa_email()}")
281 get_url = f"https://iam.googleapis.com/v1/projects/{PROJECT_ID}/serviceAccounts/{sa_email()}:getIamPolicy"
282 set_url = f"https://iam.googleapis.com/v1/projects/{PROJECT_ID}/serviceAccounts/{sa_email()}:setIamPolicy"
283 changed = _add_iam_member(get_url, set_url,
284 "roles/iam.serviceAccountTokenCreator",
285 f"serviceAccount:{SELECT_BACKEND_SA}")
286 ok(f"tokenCreator {'granted' if changed else '(already present)'}")
287
288
289# ---------- verification ----------
290
291def _impersonated_client() -> bigquery.Client:
292 src, _ = google.auth.default(
293 scopes=["https://www.googleapis.com/auth/cloud-platform"],
294 )
295 tgt = impersonated_credentials.Credentials(
296 source_credentials=src,
297 target_principal=sa_email(),
298 target_scopes=["https://www.googleapis.com/auth/cloud-platform"],
299 lifetime=600,
300 )
301 return bigquery.Client(credentials=tgt, project=PROJECT_ID)
302
303
304def _dry_run(client: bigquery.Client, label: str, sql: str) -> bool:
305 cfg = bigquery.QueryJobConfig(dry_run=True, use_query_cache=False)
306 try:
307 job = client.query(sql, job_config=cfg)
308 gb = (job.total_bytes_processed or 0) / 1e9
309 ok(f"{label} (dry-run, est. {gb:.3f} GB if executed)")
310 return True
311 except Exception as e:
312 fail(f"{label}: {e}")
313 return False
314
315
316def verify() -> bool:
317 step("Verification — dry-runs (free) + one cheap read on billing export")
318 # Wait for IAM to propagate — tokenCreator was granted seconds ago.
319 waits = [0, 15, 30, 30]
320 client = _impersonated_client()
321 for attempt, delay in enumerate(waits, start=1):
322 if delay:
323 print(f" retrying impersonation in {delay}s "
324 f"(attempt {attempt}/{len(waits)})...")
325 time.sleep(delay)
326 try:
327 client.query("SELECT 1").result()
328 ok(f"impersonation works (attempt {attempt})")
329 break
330 except Exception as e:
331 last_err = e
332 else:
333 fail(f"impersonation failed: {last_err!r}")
334 return False
335
336 all_ok = True
337 # Dry-run probes: prove permission to read the views without scanning.
338 all_ok &= _dry_run(
339 client, "INFORMATION_SCHEMA.JOBS_BY_ORGANIZATION",
340 f"SELECT job_id FROM `{REGION}`.INFORMATION_SCHEMA.JOBS_BY_ORGANIZATION "
341 f"WHERE creation_time >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 MINUTE)",
342 )
343 all_ok &= _dry_run(
344 client, "INFORMATION_SCHEMA.TABLE_STORAGE_BY_ORGANIZATION",
345 f"SELECT table_schema FROM "
346 f"`{REGION}`.INFORMATION_SCHEMA.TABLE_STORAGE_BY_ORGANIZATION",
347 )
348 all_ok &= _dry_run(
349 client, f"INFORMATION_SCHEMA.SCHEMATA ({PROJECT_ID})",
350 f"SELECT schema_name FROM "
351 f"`{PROJECT_ID}.{REGION}`.INFORMATION_SCHEMA.SCHEMATA",
352 )
353
354 # One real read on the billing dataset. dry-run is enough for query-level
355 # views; here we want to actually exercise the dataset-level READER grant.
356 # We constrain on _PARTITIONTIME so the scan is tiny — LIMIT alone would
357 # not reduce the scan.
358 bp, bd = parse_billing_dataset()
359 try:
360 tables = list(client.list_tables(f"{bp}.{bd}"))
361 except Exception as e:
362 fail(f"list tables in {bp}:{bd}: {e}")
363 return False
364 billing = next((t for t in tables
365 if t.table_id.startswith("gcp_billing_export_resource_v1_")),
366 None)
367 if billing is None:
368 warn(f"no gcp_billing_export_resource_v1_* table in {bp}:{bd} yet — "
369 "the first export can take up to 24h. Re-run verify later.")
370 return all_ok
371
372 sql = (
373 f"SELECT cost FROM `{bp}.{bd}.{billing.table_id}` "
374 f"WHERE DATE(_PARTITIONTIME) >= DATE_SUB(CURRENT_DATE(), INTERVAL 2 DAY) "
375 f"LIMIT 1"
376 )
377 try:
378 rows = list(client.query(sql).result(max_results=1))
379 ok(f"read {len(rows)} row(s) from {billing.table_id} "
380 f"(partition-pruned to last 2 days)")
381 except Exception as e:
382 fail(f"billing read: {e}")
383 all_ok = False
384 return all_ok
385
386
387# ---------- main ----------
388
389def main() -> None:
390 if PROJECT_ID in ("customer-billing-project",) or ORG_ID == "123456789012":
391 print("Edit the CONFIG block at the top of this script first.",
392 file=sys.stderr)
393 sys.exit(2)
394
395 bp, bd = parse_billing_dataset()
396 print("SELECT for BigQuery — onboarding")
397 print(f" PROJECT_ID: {PROJECT_ID}")
398 print(f" ORG_ID: {ORG_ID}")
399 print(f" BILLING_DATASET: {bp}:{bd}")
400 print(f" SERVICE_ACCOUNT: {sa_email()}")
401 print(f" REGION: {REGION}")
402
403 try:
404 enable_apis()
405 create_service_account()
406 grant_project_iam()
407 grant_org_iam()
408 grant_dataset_iam()
409 grant_token_creator()
410 except Exception as e:
411 fail(f"setup failed: {e}")
412 sys.exit(1)
413
414 verify_ok = verify()
415
416 print()
417 print("============================================================")
418 print(" Paste these into select.dev → Settings → BigQuery → Add Account:")
419 print("============================================================")
420 print(f" Organization ID: {ORG_ID}")
421 print(f" Billing export project ID: {bp}")
422 print(f" Billing export dataset name: {bd}")
423 print(f" Service account email: {sa_email()}")
424 print("============================================================")
425 if not verify_ok:
426 print("\nNote: one or more verification checks failed above. IAM grants")
427 print("can take a couple minutes to propagate; if the failures look")
428 print("transient, re-run the verify step.")
429 sys.exit(1)
430
431
432if __name__ == "__main__":
433 main()

Add account to SELECT

  • Navigate to Select: https://select.dev
  • Go to the BigQuery tab and click the Add Account button.
  • Enter a memorable account name, and fill out all the details from the previous steps:
    • Organization ID
    • Billing export project ID (e.g. customer-billing-project)
    • Billing export dataset name (e.g. billing_export)
    • Service account email (e.g. [email protected])
  • Click Add. SELECT will run a test query to verify the connection.

Get up and running in less than 15 minutes

Connect your Snowflake account and instantly understand your savings potential.

CTA Screen