A deep dive into Snowflake's Git Integration
- Date
- Tomáš SobotíkSenior Data Engineer & Snowflake SME at Norlys
Ever since I started using Snowflake, there's one thing I always thought was missing: a native Git integration. Without version control, it was easy to mess up your infrastructure, but thankfully this is no longer the case. Snowflake's Git integration was released in April 2024 and was a feature I personally requested multiple times! Let’s take a closer look at this feature, explain how to use it, and showcase some typical use cases.
What is Snowflake’s Git Integration?
The Snowflake Git integration allows you to integrate your Snowflake account with one of the supported Git platforms (I.e. GitHub, GitLab, etc.) natively and synchronize the contents of a remote repository with your Snowflake account. For this synchronization, Snowflake uses a special type of stage called a repository stage. This repository stage is then synchronized with the remote Git repository and becomes a local repository with a full clone of the remote repository, including everything you would expect—branches, commits, etc.
With this synchronization, you have a full clone of the repository available directly in your Snowflake account. You can use the fetched files in your Snowflake applications or write UDFs/stored procedures where the handler code is stored in a remote Git repository and synchronized with the repository stage. This allows you to keep it under version control easily and refresh the stage whenever the repository is updated.
You can also use any file from any branch or commit directly in Snowflake.
Why is this New Feature so Exciting?
Thanks to this integration, you can simplify the development lifecycle for your code if you want to keep it under version control. Let’s take a stored procedure handler code as an example. Before this integration, you had to manage version control on your own and outside of Snowflake. You wrote and tested your handler code in VS Code. When you finished development, you had to commit the changes to a remote repository to keep them under version control. At the same time, you also had to deploy the code into Snowflake and create a stored procedure with that code as the handler.
This meant either manually uploading the code to a Snowflake stage and then creating the stored procedure or using the Snowflake CLI to handle both uploading and stored procedure creation. Either way, this was an additional step that was not directly connected to your versioned code. If you needed to modify the code, you had to go through the entire process again—develop the code, commit it to the repository, and update the handler file in Snowflake.
Let’s look at how the native Git integration can streamline this process. Since you can reference the handler code directly in the repository stage, whenever you commit changes and synchronize your repository stage, the stored procedure handler is automatically updated and remains under version control. No more multiple standalone processes (version control & SP updates)—just a single streamlined workflow.
The Snowflake Git integration allows you to connect Git repositories from the following supported platforms:
- GitHub
- GitLab
- BitBucket
- Azure DevOps
- AWS CodeCommit
End to End Example of the Git integration
In this example, we are going to write a simple Hello World stored procedure handler in Python, commit the changes to a remote GitHub repository, and bring it into Snowflake through the Git integration. Then, we will modify the handler code and see how easily we can update the SP handler code in Snowflake thanks to the Git integration. First, let’s look at a simple diagram illustrating what we are going to build:
- First, we develop the handler code locally in VS Code and commit it to a GitHub repository.
- After that, we create a repository stage in Snowflake, which points to the remote repository. We also need to create a new secret to store our Git account credentials.
- Once we have a repository stage in Snowflake, we synchronize it with the remote repository to bring the code into Snowflake.
- Finally, we create a stored procedure in Snowflake and import the handler code from our repository stage.
Let's walk through each step in more detail.
1. Handler code development
I created a simple handler function called hello_world and pushed it to my repository. Now, we can proceed with creating the necessary objects in Snowflake and linking this remote repository to a new repository stage in Snowflake.
2. Secret creation
Let’s start with secret creation. I use a personal access token for authentication.
3. API Integration creation
We also need a new API integration to connect Snowflake with GitHub. The API_ALLOWED_PREFIXES
parameter points to my GitHub account URL, and for authentication, we use the secret created in the previous step.
4. Repository stage creation
Finally, we can create a Git repository stage and pass the values we created earlier. The origin is our Git repository that we want to connect to.
5. Sync repository stage with remote
We have a repository stage in place; let’s sync it with the remote repository.
6. Stored procedure creation
And that’s it! Now, we have a working integration between Snowflake and the remote Git repository. Our stored procedure handler is sourced from the repository stage, which is synchronized with the remote repository. Let’s create a stored procedure:
7. Synchronization with remote repository
How can we try running it?
Let’s now modify the code to see how easily we can fetch updates in Snowflake. Again, we are going to make these updates locally in VS Code and commit them to the remote repository. Let’s make a simple change to convert the message to uppercase.
Once the changes are committed to the repository, we have to update (synchronize) our repository stage in Snowflake.
ALTER GIT REPOSITORY snowflake_git_demo FETCH;
Now, we have the updated handler code available in Snowflake! That’s it—nothing more is needed. No procedure recreation or anything like that. We can just run the stored procedure to verify that the code is updated:
8. Automatic update with new merge
There is a way to automate this. Suppose you have a standard way of working with Git, where you keep your code changes in standalone feature branches that need to be merged into the main branch. We can build a GitHub Actions workflow that runs the ALTER GIT REPOSITORY
command when the PR is merged, so the repository stage is automatically updated every time you push new code into your main branch!
Here is a simple example. You can use either SnowSQL or SnowCLI to run the SQL statement.
name: Deploying Stored procedure updates
env:
SNOWSQL_ACCOUNT: ${{secrets.SF_ACCOUNT}}
SNOWSQL_USER: ${{secrets.SF_USER}}
SNOWSQL_PWD: ${{secrets.SF_PASSWORD}}
SNOWSQL_DATABASE: ${{ secrets.SF_DATABASE }}
SNOWSQL_SCHEMA: ${{ secrets.SF_SCHEMA }}
SNOWSQL_ROLE: ${{ secrets.SF_ROLE }}
SNOWSQL_WAREHOUSE: ${{ secrets.SF_WAREHOUSE }}
on:
push:
jobs:
deploy-Snowflake-changes:
name: Snowflake infrastructure deployment
runs-on: ubuntu-latest
steps:
- name: Install SnowSQL
run: |
curl -O https://sfc-repo.snowflakecomputing.com/snowsql/bootstrap/1.2/linux_x86_64/snowsql-1.2.32-linux_x86_64.bash
SNOWSQL_DEST=~/bin SNOWSQL_LOGIN_SHELL=~/.profile bash snowsql-1.2.32-linux_x86_64.bash
- name: Verify the installation
run: |
~/bin/snowsql -v
- name: Sync GIT with Snowflake GIT Stage
run: ~/bin/snowsql -q 'ALTER GIT REPOSITORY snowflake_git_demo FETCH; LS @snowflake_git_demo/branches/main;'
And here is the output from the GitHub Actions workflow run:
Additional Operations
Now that we've walked through how to set up the Git integration, there are more features which you may want to explore:
Managing Repositories in Snowflake
Our example showed just two operations related to Git repositories in Snowflake: the creation of the repository stage and the ALTER command for fetching remote updates. Let’s go through other examples of what you might want to do with your repositories:
List repository branches
We can list all branches in our repository stage, along with the path and commit hash.
SHOW GIT BRANCHES IN snowflake_git_demo;
List repository files
Similarly to listing files in any internal or external stage, we can also list the files in the repository stage using the LIST command:
LS @snowflake_git_demo
Repository stage
But for the repository stage, we can list files by individual branch name:
LS @snowflake_git_demo/branches/main
Individual commit hash
LS @snowflake_git_demo/commits/<<add_my_commit_hash>>
List files by tag name
LS @snowflake_git_demo/tags/tag_name
Check the repository stage properties
As with many other Snowflake objects, there are also SHOW
and DESCRIBE
commands for repository stages that can display useful metadata, such as where your repository stage is located (database and schema name), what the origin of the remote repo is, which API integration is used to connect Git and Snowflake, and what the secret name is that holds the secret for the remote Git repository connection.
Running the code from a repository
Keeping files in a repository is useful, but what about running the code stored in those files? Have you thought about that? Snowflake has the EXECUTE IMMEDIATE FROM
command, which allows you to run SQL code directly from files. You can store your account configuration (user, roles, warehouse creation) in SQL files in the repo, and thanks to this command, run it directly from the file:
EXECUTE IMMEDIATE FROM @snowflake_git_demo/branches/main/sql/users.sql
Copy code from repository into worksheet
You can also copy code from files stored in the repository stage into your worksheets. You can copy content from either .sql or .py files. Then, you can edit or run such code in Snowsight worksheets. You just need to navigate to the file you want to copy from and then select the Copy into worksheet option.
Limitations
If you want to save your changes back into the repository, you have to do it in your local copy of the repository because currently, repositories are read-only in Snowflake, except for in Notebooks, which can also write back. So you can only use Notebooks to write back to the repositories. Other limitations of the feature at the time of writing include:
- Read-only support - see above
- Only public internet access. You can’t access repositories that are behind a private network.
- The repository stage can’t be shared with data sharing.
- You can’t create a repository stage inside an application package.
- You can’t create a repository stage inside a native app on the customer side.
- Accessing files in the repository stage might be slower compared to accessing files in an internal Snowflake stage. Don’t use it for data ingestion use cases.
- Submodules are currently not supported.
Snowflake Git Integration Use Cases
The end-to-end example showed one of the possible use cases for using the Git integration in Snowflake: keeping your procedure/UDF code under version control in a remote repository. But it’s not the only use case for this feature.
Git integration is also crucial for managing your account using DevOps practices. You can keep the definitions of your Snowflake objects (databases, warehouses, users, roles, etc.) in a remote repository as SQL files with CREATE
or ALTER
definitions and run them as part of your CI/CD pipelines. Thanks to repository stages, you have a copy of those definitions available right in Snowflake, and you can run the code there directly without needing any external runner to deploy those objects!
Another use case could be related to your native/Streamlit apps, where you can have the code natively integrated with remote repositories. Last but not least, if you use Snowflake for your data transformations—DAGs with tasks or dynamic tables—you can keep the pipeline definitions under version control and easily integrate them with repository stages and the Snowflake environment.