Snowflake 101: Working with Stored Procedures
- Date
- Tomáš SobotíkLead Data Engineer at Tietoevry
Stored procedures have been a staple of RDBMS for ages, predating the cloud era and dedicated transformation tools. They have played a crucial role in building data pipelines and automating databases. In this post, we'll explore what stored procedures are and how to harness their potential in modern cloud-based platforms like Snowflake.
What are Stored Procedures?
Stored procedures enhance the SQL language with procedural logic, allowing the use of conditional statements, loops, and the creation of functionalities otherwise impossible without these constructs.
Difference between UDFs and Stored Procedures
In addition to stored procedures, you can create user-defined functions (UDFs). Understanding the distinction between them is crucial to deciding when to use UDFs and when to opt for stored procedures (SP).
Stored procedures are typically employed for administrative actions in Snowflake, such as cleaning up old data, dropping unused tables, or custom data backup. UDFs, on the other hand, come into play when you need to calculate and return a value as part of a SELECT
SQL query, like revenue for a given sales executive or employee bonuses.
The invocation of stored procedures uses the CALL
keyword as an independent statement:
CALL my_stored_procedure(input_param);
UDFs, however, are called as part of a SELECT statement:
SELECT column1, my_udf(input_parameter) FROM table1;
Another distinction lies in the return value; stored procedures may not return anything, simply performing a task. UDFs, on the contrary, must always return a value.
Let's explore some typical use cases and determine whether stored procedures or UDFs are more suitable:
Use case | UDF or Stored Procedure? |
---|---|
Creating a new user and dedicated warehouse | Stored Procedure |
Deleting all temporary tables | Stored Procedure |
Cleaning up unused tables | Stored Procedure |
Determining the city from an IP address | UDF |
Extracting browser type from a user agent string | UDF |
Calculating discounts for orders | UDF |
Loading a table from a stage | Stored Procedure |
Supported Programming Languages
Snowflake supports various languages for developing stored procedures. Choosing the right one depends on several factors:
- Personal language preference and knowledge
- Availability of necessary libraries
- Consistency with existing code in supported languages
- Whether you want to keep the code in-line or externally (as a standalone file in a stage)
Here is a summary of available languages along with information on whether they support in-line or staged handling:
Language | Handler Location |
---|---|
Java | In-line or staged |
JavaScript | In-line |
Python | In-line or staged |
Scala | In-line or staged |
Snowflake SQL Scripting | In-line |
Both in-line and staged options have their pros and cons. In-line development may be simpler, allowing you to fine-tune the code and add it directly to the SP definition. Staged handling is advantageous for compiled code (Java, Scala), enabling the reuse of existing compiled code in Snowflake stored procedures.
How to Create a Stored Procedure
Using Snowflake SQL Scripting
Let's delve into the syntax of creating stored procedures (SP) using the Snowflake SQL Scripting language and decipher the meaning behind different keywords. As shown in the picture, a stored procedure consists of the following:
- Stored procedure name
- Input parameters (arguments) along with their data types
- The return type of the stored procedure output
- The language specification
- The body of the stored procedure, where the actual logic will live
The hello_world
stored procedure above simply returns the message passed in as an argument. By calling it, we can see the output printed:
Using Javascript
Let's modify the hello_world
example and create it in JavaScript. Here is the code for the same procedure but now in JavaScript:
create or replace procedure hello_world_javascript(message varchar)
returns varchar not null
language javascript
as
$$
return message;
$$;
We defined JavaScript as the language, using different characters to enclose the procedure body ($$
).
Calling the procedure is the same:
Using Python
In this last example, we'll create the same hello_world
procedure in Python:
create or replace procedure hello_world_python(message varchar)
returns varchar not null
language python
runtime_version = 3.11
handler = 'hello_world'
packages = ('snowflake-snowpark-python')
as
$$
def hello_world(session, message):
return message;
$$;
You can notice multiple changes compared to SQL Scripting or JavaScript. Let’s update our diagram and highlight the changes:
- For Python, we must specify the Python version to use
- We must define a
handler
. This is the python function that will be run in the stored procedure - We must specify any Python packages we will use in our stored procedure
Calling the Python stored procedure is the same as any other language:
Calling (Running) Stored Procedures
Next, let’s talk about how you can run (or “call”) a stored procedure, especially with arguments.
Using Arguments in Stored Procedures
Stored procedures support defining arguments that you can pass from outside. We’ve used the argument called message
in our hello_world
example. You can also see how to specify the value when calling the stored procedure.
Arguments can be optional if they have a defined default value in SP definition. In that case, you can omit the argument when calling SP, and the default value will be used. Let’s modify our example to use an optional argument.
create or replace procedure hello_world(message varchar DEFAULT 'Hello from Stored procedure written in SQL scripting!')
returns varchar not null
language sql
as
begin
return message;
end;
Working with Optional and Required Arguments
In case you have both optional and required arguments, you have to specify the required ones first. Now we can call the procedure without the argument value, and Snowflake will use the default one.
But, of course, we can overwrite the default value and pass a new value when calling the procedure:
How to Use Bind Variables in a SQL Statement?
Arguments are also often used to pass values into SQL queries as bind variables inside stored procedures. This allows the SQL query to be dynamically created based on the arguments. Examples of this include passing a date, customer identifier, or table name that should be cleared. Let’s create a simple example where we are going to dynamically construct a SELECT statement and return values from a table for a given id:
-- create a test table
create table SP_TEST (
id number,
value varchar
);
-- insert data
insert into SP_TEST values (1, 'value 1');
insert into SP_TEST values (2, 'value 2');
-- create a stored procedure
CREATE PROCEDURE DYNAMIC_QUERY(ID number)
RETURNS VARCHAR
language sql
as
declare output varchar;
BEGIN
select value into :output from SP_TEST where id = :id;
return output;
END;
-- SP call
call DYNAMIC_QUERY(2);
The stored procedure code contains two new concepts. First is how to bind input arguments to SQL statements. Here we use a colon character before the argument name :id
. The code also declares an output variable and passes the result of the SELECT statement into this variable as part of SQL code. This is done by the :into <variable_name>
syntax.
Privileges and Execution Types
Stored procedures are database objects like tables or views. This means they are owned by a role. Apart from the OWNERSHIP
privilege, there is also a USAGE
privilege on the SP that can be granted to other roles. There is one more topic related to privileges and running stored procedures. Snowflake supports two different modes for how you can interact with them. You can either use Caller’s Rights or Owner’s Rights. When you create a procedure, you define whether you want to use the caller’s or owner’s rights. The default option is owner’s rights. Let’s explain both modes.
Caller’s Rights
When you run a SP with caller’s rights, it means the SP uses the privileges that the caller has. The procedure also has access to the caller’s session information - it can access session variables and use them in queries or modify them. Such session changes will persist after the SP call is done. SP uses database privileges that the caller has - it can access the same DB object as the caller. If SP contains any statement or DB object that the caller doesn't have permission to use, then the SP will raise a permissions error.
Owner’s Rights
In this case, the procedure runs with the rights of the owner and is decoupled from the caller. This means the SP can do things that the caller can’t do directly. This is a great example of how to delegate some tasks to other roles without giving them the privileges to do so. You can create an SP to delete old data from tables and give it to users for use without giving them the DELETE privilege on those tables. The procedure running with owner’s rights does not have access to the session information of the caller and can’t modify it either. SP also does not have access to variables created outside the SP - if that is needed, you have to pass the value as an input argument.
What Mode to Use?
It depends on your needs. In case you want to delegate something to other users, use the owner’s rights. Use the same mode if it doesn’t prevent SP from working properly or you don’t want to expose the procedure code to callers. On the other hand, use the caller’s rights mode if you need to have access to current session information or the procedure uses only objects that the caller owns or can use. Let’s summarize both options in the table:
Caller’s Rights | Owner’s Rights |
---|---|
Runs with privileges of the caller | Runs with privileges of the SP owner |
Has access to caller’s current session | Limited access to caller’s session |
Changes in session persist after end of SP call | Can’t change session state |
Can view, set, unset caller’s session variables and parameters | Can’t view, set, unset caller’s session variables and parameters |
Can use only objects and operations which caller can do | Decoupled from caller’s privileges |
Return Types
Procedures might return a value (number, string, boolean). It is also very very common for a SP to return nothing, since it may just execute some code and not need to return anything.
A special return type from SPs is a table. Let’s go through examples and show how to write a procedure without a return value and a procedure that returns a table.
create or replace procedure no_return_value()
returns varchar null
language sql
as
begin
select 1;
select 2;
select 3;
select 4;
end;
When you check CREATE PROCEDURE
statement in docs, you will find out that RETURNS
keyword is mandatory. To define stored procedure with no return value you have to define some return data type and omit the return
keyword in SP code block. It’s not necessary to define return type specifically with the NULL
definition. The code would work without that as well, i.e. RETURNS VARCHAR
.
When you call such a procedure, you will get a null value as output:
To return a table, you have to specify the return type as a table. If you know the data types of returned columns you can specify them right away with column names. Otherwise, you can do it during runtime.
To return a table, we need to introduce one more data type called RESULTSET
. This data type can hold the result of a SQL query. If you need to process it further you can iterate over rows or just return the result as we are going to do in this example:
create or replace procedure returning_table()
returns table(id number, name varchar)
language sql
as
declare
result RESULTSET DEFAULT (SELECT 1 id, 'test value' name);
begin
return table(result);
end;
In this example, we assign a default value to our result variable and then return it inside the table()
function. When you call this SP you will get a table with two columns as output:
Execute Multiple SQL Statements by Looping Through Results
We have gone through basic concepts around writing stored procedures in Snowflake. Let’s try to put all those concepts together now and create some real use cases to showcase how to use stored procedures for real tasks. One of the common administrative tasks could be executing multiple SQL statements dynamically created based on the results of another query. For instance, you might want to clean all empty tables. You have to find all tables with 0 records first and then iterate through and delete them. You can solve it with the following stored procedure:
create or replace procedure clean_empty_tables(db_name varchar)
returns varchar
language sql
as
declare
table_list RESULTSET DEFAULT (SELECT TABLE_NAME, TABLE_TYPE, TABLE_SCHEMA FROM SNOWFLAKE.ACCOUNT_USAGE.TABLES WHERE table_catalog = :db_name and row_count = 0 and deleted is null);
c1 CURSOR FOR table_list;
drop_statement varchar;
begin
for record in c1 do
drop_statement := 'DROP ' || record.table_type || ' ' || :db_name || '.' || record.table_schema || '.' || record.table_name || ';';
execute immediate drop_statement;
end for;
return 'cleaning done;';
end;
Let’s go through the code to explain the logic.
- We pass the database name as a parameter to clean tables and views only in the defined DB.
- We declare a
RESULTSET
variabletable_list
holding the empty table names as a result of the query. - We also define a cursor for iterating through the results and a variable for the
DROP
statement which we will be dynamically creating and executing in the loop. - The procedure body then just goes through the query result row by row and constructs a drop statement based on values retrieved from the cursor. We listed both views and tables but if you want to clean only tables, you have to modify the input query.
- Once the drop statement is created we just execute it through
EXECUTE IMMEDIATE
.
When you call the stored procedure it should return a message 'cleaning done'. You can find the dropped tables/views in the query history or you can modify the return message and return also dropped object names.
Documenting Stored Procedures
A stored procedure is a piece of code and as such, it should be properly commented to simplify future maintenance and usage. There are multiple options on how you can do it. Either you will leverage some external tool like a data catalog or internal wiki and describe the stored procedure there. You should always focus on two aspects:
- Documentation for users and callers
- Documentation for programmers
Here are some examples of what should be included in the documentation:
- Describe the logic of the stored procedure and intention
- Describe authors
- Describe location - database and schema
- Describe input parameters - names, data types, and meaning
- Describe return values, possible errors, and exceptions
- Describe prerequisites
- What are the needed privileges
You can add comments directly into the source code and describe the algorithm. The last recommendation from my side would be to keep the stored procedures' source code under version control (GIT). Remember that stored procedures are not part of Time Travel in Snowflake. You can’t get the previous version of the code directly in Snowflake.
Stored Procedure Examples
Let's summarize all the stored procedure examples we showed in this post for quick reference:
Hello World
SQL Scripting
create or replace procedure hello_world(message varchar)
returns varchar not null
language sql
AS
begin
return message;
end;
Javascript
create or replace procedure hello_world_javascript(message varchar)
returns varchar not null
language javascript
as
$$
return message;
$$;
Python
create or replace procedure hello_world_python(message varchar)
returns varchar not null
language python
runtime_version = 3.11
handler = 'hello_world'
packages = ('snowflake-snowpark-python')
as
$$
def hello_world(session, message):
return message;
$$;
Dynamically generate SQL query using arguments
-- create a test table
create table SP_TEST (
id number,
value varchar
);
-- insert data
insert into SP_TEST values (1, 'value 1');
insert into SP_TEST values (2, 'value 2');
-- create a stored procedure
CREATE PROCEDURE DYNAMIC_QUERY(ID number)
RETURNS VARCHAR
language sql
as
declare output varchar;
BEGIN
select value into :output from SP_TEST where id = :id;
return output;
END;
-- SP call
call DYNAMIC_QUERY(2);
Return a table from a SQL query
create or replace procedure returning_table()
returns table(id number, name varchar)
language sql
as
declare
result RESULTSET DEFAULT (SELECT 1 id, 'test value' name);
begin
return table(result);
end;
Execute multiple SQL statements
create or replace procedure clean_empty_tables(db_name varchar)
returns varchar
language sql
as
declare
table_list RESULTSET DEFAULT (SELECT TABLE_NAME, TABLE_TYPE, TABLE_SCHEMA FROM SNOWFLAKE.ACCOUNT_USAGE.TABLES WHERE table_catalog = :db_name and row_count = 0 and deleted is null);
c1 CURSOR FOR table_list;
drop_statement varchar;
begin
for record in c1 do
drop_statement := 'DROP ' || record.table_type || ' ' || :db_name || '.' || record.table_schema || '.' || record.table_name || ';';
execute immediate drop_statement;
end for;
return 'cleaning done;';
end;
Send alerts for suspended tasks
Here's an example from a previous blog post. This stored procedure, named task_state_monitor
, takes a task_name
as a parameter and checks the state of the task. If the state is 'suspended', it sends an email alert and returns a message. Otherwise, it returns a message indicating that the task state is okay.
create or replace procedure task_state_monitor(task_name string)
returns varchar not null
language SQL
AS
$$
DECLARE
task_state string;
c CURSOR FOR SELECT "state" from table(result_scan(last_query_id())) where "name" = ?;
BEGIN
show tasks;
open c USING (task_name);
fetch c into task_state;
IF(task_state = 'suspended') THEN
CALL SYSTEM$SEND_EMAIL(
'my_email_int',
'Email alert: Task is suspended!',
'Please check the task state.'
);
RETURN 'Email has been sent.';
ELSE
RETURN 'Task state is ok.';
END IF;
END;
$$
;