A deep dive into Snowflake's Git Integration

Date
  • Tomáš Sobotík
    Senior Data Engineer & Snowflake SME at Norlys

Ever since I started using Snowflake, there's one thing I always thought was missing: a native Git integration. Without version control, it was easy to mess up your infrastructure, but thankfully this is no longer the case. Snowflake's Git integration was released in April 2024 and was a feature I personally requested multiple times! Let’s take a closer look at this feature, explain how to use it, and showcase some typical use cases.

What is Snowflake’s Git Integration?

The Snowflake Git integration allows you to integrate your Snowflake account with one of the supported Git platforms (I.e. GitHub, GitLab, etc.) natively and synchronize the contents of a remote repository with your Snowflake account. For this synchronization, Snowflake uses a special type of stage called a repository stage. This repository stage is then synchronized with the remote Git repository and becomes a local repository with a full clone of the remote repository, including everything you would expect—branches, commits, etc.

SELECT Snowflake Git Integration

With this synchronization, you have a full clone of the repository available directly in your Snowflake account. You can use the fetched files in your Snowflake applications or write UDFs/stored procedures where the handler code is stored in a remote Git repository and synchronized with the repository stage. This allows you to keep it under version control easily and refresh the stage whenever the repository is updated.

You can also use any file from any branch or commit directly in Snowflake.

Why is this New Feature so Exciting?

Thanks to this integration, you can simplify the development lifecycle for your code if you want to keep it under version control. Let’s take a stored procedure handler code as an example. Before this integration, you had to manage version control on your own and outside of Snowflake. You wrote and tested your handler code in VS Code. When you finished development, you had to commit the changes to a remote repository to keep them under version control. At the same time, you also had to deploy the code into Snowflake and create a stored procedure with that code as the handler.

This meant either manually uploading the code to a Snowflake stage and then creating the stored procedure or using the Snowflake CLI to handle both uploading and stored procedure creation. Either way, this was an additional step that was not directly connected to your versioned code. If you needed to modify the code, you had to go through the entire process again—develop the code, commit it to the repository, and update the handler file in Snowflake.

Let’s look at how the native Git integration can streamline this process. Since you can reference the handler code directly in the repository stage, whenever you commit changes and synchronize your repository stage, the stored procedure handler is automatically updated and remains under version control. No more multiple standalone processes (version control & SP updates)—just a single streamlined workflow.

The Snowflake Git integration allows you to connect Git repositories from the following supported platforms:

  • GitHub
  • GitLab
  • BitBucket
  • Azure DevOps
  • AWS CodeCommit

End to End Example of the Git integration

In this example, we are going to write a simple Hello World stored procedure handler in Python, commit the changes to a remote GitHub repository, and bring it into Snowflake through the Git integration. Then, we will modify the handler code and see how easily we can update the SP handler code in Snowflake thanks to the Git integration. First, let’s look at a simple diagram illustrating what we are going to build:

SELECT Snowflake Git Integration
  1. First, we develop the handler code locally in VS Code and commit it to a GitHub repository.
  2. After that, we create a repository stage in Snowflake, which points to the remote repository. We also need to create a new secret to store our Git account credentials.
  3. Once we have a repository stage in Snowflake, we synchronize it with the remote repository to bring the code into Snowflake.
  4. Finally, we create a stored procedure in Snowflake and import the handler code from our repository stage.

Let's walk through each step in more detail.

1. Handler code development

I created a simple handler function called hello_world and pushed it to my repository. Now, we can proceed with creating the necessary objects in Snowflake and linking this remote repository to a new repository stage in Snowflake.

SELECT Snowflake Git Integration

2. Secret creation

Let’s start with secret creation. I use a personal access token for authentication.

SELECT Snowflake Git Integration

3. API Integration creation

We also need a new API integration to connect Snowflake with GitHub. The API_ALLOWED_PREFIXES parameter points to my GitHub account URL, and for authentication, we use the secret created in the previous step.

SELECT Snowflake Git Integration

4. Repository stage creation

Finally, we can create a Git repository stage and pass the values we created earlier. The origin is our Git repository that we want to connect to.

SELECT Snowflake Git Integration

5. Sync repository stage with remote

We have a repository stage in place; let’s sync it with the remote repository.

SELECT Snowflake Git Integration

6. Stored procedure creation

And that’s it! Now, we have a working integration between Snowflake and the remote Git repository. Our stored procedure handler is sourced from the repository stage, which is synchronized with the remote repository. Let’s create a stored procedure:

SELECT Snowflake Git Integration

7. Synchronization with remote repository

How can we try running it?

SELECT Snowflake Git Integration

Let’s now modify the code to see how easily we can fetch updates in Snowflake. Again, we are going to make these updates locally in VS Code and commit them to the remote repository. Let’s make a simple change to convert the message to uppercase.

SELECT Snowflake Git Integration

Once the changes are committed to the repository, we have to update (synchronize) our repository stage in Snowflake.

ALTER GIT REPOSITORY snowflake_git_demo FETCH;

Now, we have the updated handler code available in Snowflake! That’s it—nothing more is needed. No procedure recreation or anything like that. We can just run the stored procedure to verify that the code is updated:

SELECT Snowflake Git Integration

8. Automatic update with new merge

There is a way to automate this. Suppose you have a standard way of working with Git, where you keep your code changes in standalone feature branches that need to be merged into the main branch. We can build a GitHub Actions workflow that runs the ALTER GIT REPOSITORY command when the PR is merged, so the repository stage is automatically updated every time you push new code into your main branch!

Here is a simple example. You can use either SnowSQL or SnowCLI to run the SQL statement.

name: Deploying Stored procedure updates
env:
SNOWSQL_ACCOUNT: ${{secrets.SF_ACCOUNT}}
SNOWSQL_USER: ${{secrets.SF_USER}}
SNOWSQL_PWD: ${{secrets.SF_PASSWORD}}
SNOWSQL_DATABASE: ${{ secrets.SF_DATABASE }}
SNOWSQL_SCHEMA: ${{ secrets.SF_SCHEMA }}
SNOWSQL_ROLE: ${{ secrets.SF_ROLE }}
SNOWSQL_WAREHOUSE: ${{ secrets.SF_WAREHOUSE }}
on:
push:
jobs:
deploy-Snowflake-changes:
name: Snowflake infrastructure deployment
runs-on: ubuntu-latest
steps:
- name: Install SnowSQL
run: |
curl -O https://sfc-repo.snowflakecomputing.com/snowsql/bootstrap/1.2/linux_x86_64/snowsql-1.2.32-linux_x86_64.bash
SNOWSQL_DEST=~/bin SNOWSQL_LOGIN_SHELL=~/.profile bash snowsql-1.2.32-linux_x86_64.bash
- name: Verify the installation
run: |
~/bin/snowsql -v
- name: Sync GIT with Snowflake GIT Stage
run: ~/bin/snowsql -q 'ALTER GIT REPOSITORY snowflake_git_demo FETCH; LS @snowflake_git_demo/branches/main;'

And here is the output from the GitHub Actions workflow run:

SELECT Snowflake Git Integration

Additional Operations

Now that we've walked through how to set up the Git integration, there are more features which you may want to explore:

Managing Repositories in Snowflake

Our example showed just two operations related to Git repositories in Snowflake: the creation of the repository stage and the ALTER command for fetching remote updates. Let’s go through other examples of what you might want to do with your repositories:

List repository branches

We can list all branches in our repository stage, along with the path and commit hash.

SHOW GIT BRANCHES IN snowflake_git_demo;

List repository files

Similarly to listing files in any internal or external stage, we can also list the files in the repository stage using the LIST command:

LS @snowflake_git_demo

Repository stage

But for the repository stage, we can list files by individual branch name:

LS @snowflake_git_demo/branches/main

Individual commit hash

LS @snowflake_git_demo/commits/<<add_my_commit_hash>>

List files by tag name

LS @snowflake_git_demo/tags/tag_name

Check the repository stage properties

As with many other Snowflake objects, there are also SHOW and DESCRIBE commands for repository stages that can display useful metadata, such as where your repository stage is located (database and schema name), what the origin of the remote repo is, which API integration is used to connect Git and Snowflake, and what the secret name is that holds the secret for the remote Git repository connection.

Running the code from a repository

Keeping files in a repository is useful, but what about running the code stored in those files? Have you thought about that? Snowflake has the EXECUTE IMMEDIATE FROM command, which allows you to run SQL code directly from files. You can store your account configuration (user, roles, warehouse creation) in SQL files in the repo, and thanks to this command, run it directly from the file:

EXECUTE IMMEDIATE FROM @snowflake_git_demo/branches/main/sql/users.sql

Copy code from repository into worksheet

You can also copy code from files stored in the repository stage into your worksheets. You can copy content from either .sql or .py files. Then, you can edit or run such code in Snowsight worksheets. You just need to navigate to the file you want to copy from and then select the Copy into worksheet option.

SELECT Snowflake Git Integration

Limitations

If you want to save your changes back into the repository, you have to do it in your local copy of the repository because currently, repositories are read-only in Snowflake, except for in Notebooks, which can also write back. So you can only use Notebooks to write back to the repositories. Other limitations of the feature at the time of writing include:

  • Read-only support - see above
  • Only public internet access. You can’t access repositories that are behind a private network.
  • The repository stage can’t be shared with data sharing.
  • You can’t create a repository stage inside an application package.
  • You can’t create a repository stage inside a native app on the customer side.
  • Accessing files in the repository stage might be slower compared to accessing files in an internal Snowflake stage. Don’t use it for data ingestion use cases.
  • Submodules are currently not supported.

Snowflake Git Integration Use Cases

The end-to-end example showed one of the possible use cases for using the Git integration in Snowflake: keeping your procedure/UDF code under version control in a remote repository. But it’s not the only use case for this feature.

Git integration is also crucial for managing your account using DevOps practices. You can keep the definitions of your Snowflake objects (databases, warehouses, users, roles, etc.) in a remote repository as SQL files with CREATE or ALTER definitions and run them as part of your CI/CD pipelines. Thanks to repository stages, you have a copy of those definitions available right in Snowflake, and you can run the code there directly without needing any external runner to deploy those objects!

Another use case could be related to your native/Streamlit apps, where you can have the code natively integrated with remote repositories. Last but not least, if you use Snowflake for your data transformations—DAGs with tasks or dynamic tables—you can keep the pipeline definitions under version control and easily integrate them with repository stages and the Snowflake environment.

Tomáš Sobotík
Senior Data Engineer & Snowflake SME at Norlys
Tomas is a longstanding Snowflake Data SuperHero and general Snowflake subject matter expert. His extensive experience in the data world spans over a decade, during which he has served as a Snowflake data engineer, architect, and admin on various projects across diverse industries and technologies. Tomas is a core community member, actively sharing his expertise and inspiring others. He's also an O'Reilly instructor, leading live online training sessions.

Get up and running with SELECT in 15 minutes.

Snowflake optimization & cost management platform

Gain visibility into Snowflake usage, optimize performance and automate savings with the click of a button.

SELECT web application screenshot

Want to hear about our latest Snowflake learnings? 
Subscribe to get notified.