Up & Running With the Snowflake Connector for Python
- Date
- Ian WhitestoneCo-founder & CEO of SELECT
The Snowflake Connector for Python is a Python library maintained by the Snowflake team, that is the official Python driver for Snowflake. If you are looking to programmatically connect to Snowflake and run queries via Python, you’re in the right place. Many companies, including SELECT, use this library to orchestrate workflows in Snowflake or even build data-driven applications on top of Snowflake.
How to install the Snowflake Connector for Python
The Snowflake Connector is a native Python package, meaning it can be installed just like any other Python package.
Using pip
The easiest way to install the library is using pip, the official package management system for Python:
pip install snowflake-connector-python
Using conda
If you use conda as a package management system, you can install the package with the following:
conda install snowflake-connector-python
Quickstart Example
Here’s an end-to-end example of how to use the Snowflake Connector package to execute a query and process the results. This example shows the total credit consumption by virtual warehouse over the last 30 days:
import snowflake.connector
# Connect to Snowflake
conn = snowflake.connector.connect(
user='XXXX',
password='XXXX',
account='XXXX', # your Snowflake account identifier
)
# SQL statement to show the total credit consumption by warehouse
# over the last 30 days
sql = """
select
warehouse_name,
sum(credits_used_compute) as credits_used_compute,
from snowflake.account_usage.warehouse_metering_history
where
start_time > current_date - 30
group by 1
order by 2 desc
"""
# Create a cursor which will execute the query
cur = conn.cursor()
# Execute the query
cur.execute(sql)
# Return all the results
results = cur.fetchall()
# Loop through each record and print it
for row in results:
print(row)
# Close the connection
conn.close()
The data printed to the output will look something like this:
('REPLICATOR_LARGE', Decimal('669.080833333'))
('REPLICATOR_XSMALL', Decimal('613.355000002'))
('REPLICATOR_XLARGE', Decimal('416.437500000'))
('BACKEND_LARGE', Decimal('162.702500031'))
('BACKEND', Decimal('105.263888856'))
('BACKEND_MEDIUM', Decimal('47.033055570'))
('REPLICATOR_MEDIUM', Decimal('40.775833333'))
('REPLICATOR_SMALL', Decimal('31.046388890'))
('BACKEND_STAGING', Decimal('26.985555532'))
('BACKEND_SMALL', Decimal('12.798333345'))
('SELECT_DOGFOOD', Decimal('11.283055542'))
('ADMIN', Decimal('10.899444408'))
('BACKEND_STAGING_MEDIUM', Decimal('6.822222224'))
('BI_XSMALL', Decimal('1.814999976'))
('CLOUD_SERVICES_ONLY', Decimal('0E-9'))
Not sure where to find your account identifier?
You can find your account identifier easily from Snowsight as per our instructions here.
Core Concepts
Connection & Cursor Objects
As shown in the quickstart example above, there’s two separate objects we interact with: the connection object and the cursor object.
The connection object is responsible for authenticating and connecting to Snowflake, and returning cursors. Individual cursors are used to execute queries against Snowflake and retrieve results. A single connection can have many cursors. The same cursor can be used to execute multiple queries, you do not need to create a new one each time. The same holds true for the connection object. Establishing a connection to Snowflake takes a few seconds, so you will only want to do this once.
You can learn more about the different methods available on each object at the official Snowflake Python connector API docs.
Synchronous vs. Asynchronous Queries
There’s two ways to run queries with the Snowflake Connector for Python: synchronously and asynchronously.
Submitting a synchronous query
When you submit a synchronous query, your Python process will wait until the query is returned. This is the most common approach used when working with the Python connector due to its simplicity:
cur = conn.cursor()
cur.execute('select * from table') # Python will wait until the query is done before moving to next line
results = cur.fetchall()
Submitting an asynchronous query
With an asynchronous query, the Snowflake Python package will immediately return control to your Python process before the query completes. This is particularly helpful when you’re building multi-threaded applications in Python (like a web application) or using Python to execute many different queries at once.
To run an async query, use the execute_async
cursor method instead of the regular execute
method. Later in your code, you’ll need to poll Snowflake to see if
cur = conn.cursor()
cur.execute_async('select * from table') # Python will immediately move on to next line
query_id = cur.sfqid
while True:
if not conn.is_still_running(conn.get_query_status(query_id)):
break
time.sleep(1)
cur.get_results_from_sfqid(query_id)
results = cur.fetchall()
Transaction Control
By default, when you run queries in Snowflake through the Snowflake Connector package the queries are automatically committed.
For more fine-grained control over your SQL transactions, you can set autocommit=False
when establishing your connection.
Using the with context manager
Consider the following code. If any of the SQL statements fail, Snowflake will automatically roll them back. If they succeed, they will be automatically committed once the with
block is exited.
with snowflake.connector.connect(
user=USER,
password=PASSWORD,
account=ACCOUNT,
autocommit=False,
) as conn:
cur = conn.cursor()
cur.execute("insert into table values(...)")
cur.execute("insert into table values(...)")
Manually controlling with try-except
For more fine-grained control, you can manually commit SQL transactions or rollback them back using with the conn.commit()
and conn.close()
methods and the try-except-finally
pattern:
conn = snowflake.connector.connect(
user=USER,
password=PASSWORD,
account=ACCOUNT,
autocommit=False
)
try:
cur = conn.cursor()
cur.execute("insert into table values(...)")
cur.execute("insert into table values(...)")
conn.commit()
except Exception as e:
conn.rollback()
raise e
finally:
conn.close()
Closing your connection
As a best practice, you should close your connection at the end of your Python script with the conn.close()
method. To ensure the connection gets closed even if your script encounters an error, you can wrap your code in a try - finally
block:
try:
cur.execute('...')
results = cur.fetchall()
# ...
except:
# handle errors as needed
finally:
conn.close()
Another useful pattern is to use the with()
Python pattern which automatically closes the connection:
with snowflake.connector.connect(...) as conn:
cur = conn.cursor()
cur.execute('...')
results = cur.fetchall()
# conn.close() method is automatically code after with block
# Move on to other operations not involving Snowflake
with open('filename.csv', 'w') as f:
f.write(...)
More Examples
Let’s dive into some more examples that highlight other common patterns used with the Snowflake Python Connector.
Passing Parameters into your SQL statement
A common need for developers is the ability to pass different parameters into your SQL statements.
Python has excellent support for string manipulation with f-strings, which you can use for this exact purpose:
user_id = 123456
sql = f"select * from users where user_id={user_id}"
cur.execute(sql)
Alternatively, you can leverage bind variables:
warehouse_name='ADMIN'
sql="""
select *
from snowflake.account_usage.warehouse_metering_history
where warehouse_name=%s
"""
cur.execute(sql, (warehouse_name))
Fetching All Results
In all the examples shown in this post, we’ve leveraged the fetchall()
method to return all the results from the query:
cur.execute('select * from table')
results=cur.fetchall()
Processing Batches
If you want to minimize the memory consumed by your Python application, you can process batches of data at a time. Here’s an example that processes 100K records at a time:
cur.execute('select * from table')
while True:
rows = cur.fetchmany(100000)
if not rows:
break
for row in rows:
# Process each row
print(row)
Return a list of dictionaries
By default, Snowflake will return a list of tuples, and you will not know which column each entry in the tuple refers to. Here’s an example showing how you can return a list of dictionaries for each row, where the key is the name of the column:
sql = """
select
warehouse_name,
credits_used_compute
from snowflake.account_usage.warehouse_metering_history
limit 1
"""
cur.execute(sql)
results = cur.fetchall()
result_meta = cur.description
results = [
dict(zip([col.name for col in result_meta], row)) for row in results
]
# >>> print(results)
# [{'WAREHOUSE_NAME': 'COMPUTE_WH', 'CREDITS_USED_COMPUTE': Decimal('0E-9')}
Executing multiple SQL statements at once
When running the cursor.execute
method, you can only pass in a single SQL statement. To execute multiple SQL statements in one go, you can use the execute_string
method on the connection object. A list of cursors will then be returned, which you can leverage to process the results if necessary as per the examples above
sql="""
delete from table where user_id=123;
insert into table values(...);
"""
cursors = conn.execute_string(sql)
# return results from each cursor if needed
Setting Session Parameters
When establishing your Snowflake connection, you can conveniently set different session parameters. Examples of this can include setting a query tag so all queries executed from your connection are tagged, or setting a query timeout so that your queries are automatically cancelled after a certain period.
conn = snowflake.connector.connect(
user='XXXX',
password='XXXX',
account='XXXX',
session_parameters={
'QUERY_TAG': 'COST_ANALYTICS',
'STATEMENT_TIMEOUT_IN_SECONDS': 3600, # automatically cancel query after 1 hour
}
)
You can also pass in other parameters like warehouse
to control which virtual warehouse your queries will run on. If not supplied, queries will run on the default warehouse for that user.
You can see a full list of connection parameters here.
FAQ
What version of Python is needed for the Snowflake connector?
To use the Snowflake connector for Python, you must be on Python version 3.8 or higher.
What other Python libraries does Snowflake offer?
Since coming out with the Snowflake Connector for Python, they have since released a new Python library called the Snowflake Python API.
This library provides a number of different APIs for interacting with Snowflake resources via a first class Python API. Instead of using the Snowflake connector in Python to send SQL statements to Snowflake, the Python API allows you to interact with Snowflake through pure Python commands, without having to write any SQL.
Here’s an example that creates a new schema called analytics
with a table called temperature_readings
.
from snowflake.snowpark import Session
from snowflake.core import Root
connection_params = {
"account": "ACCOUNT-IDENTIFIER",
"user": "USERNAME",
"password": "PASSWORD",
}
session = Session.builder.configs(connection_params).create()
root = Root(session)
schema = database.schemas.create(Schema(name="analytics"), mode="orreplace")
table = schema.tables.create(
Table(
name="temperature_readings",
columns=[
TableColumn("TEMPERATURE", "int", nullable=False),
TableColumn("LOCATION", "string"),
],
),
mode="orreplace",
)
Under the hood, the Snowflake Python API installs the Snowflake Connector library and presumably uses it to execute the Snowflake queries generated by these Python functions/APIs.