Event Tables for Structured Logging & Tracing in Snowflake
- Date
- Tomáš SobotíkLead Data Engineer at Tietoevry
Tracking your data pipeline or application is essential in development. Without proper logging and tracing, maintaining smooth operation is challenging. Alerts are needed to provide context during disturbances. If discrepancies occur, you must trace the issue through the transformation logic. For debugging, gaining insights into your code's behavior is crucial. Examples include needing context information like variable values, output messages, and formula results, typically logged in various forms. Snowflake offers native logging and tracing features using industry-standard libraries, making them accessible for developers. Let's explore how to establish event logging and tracing in Snowflake using these features.
What are Event Tables?
Event tables have been released as public preview in May 2023. They are a special type of Snowflake table with several differences compared to standard tables:
- predefined set of columns which can’t be modified
- used only for tracking logging and tracing data
- you can have only one active event table associated with account
Typical use cases for using event table is capturing logging information from your code handlers used as stored procedures, UDFs or collecting tracing data from native apps.
Working with Event Tables
Enabling event table for your account is done in several steps, as shown in the following image:
1. Create EVENT TABLE
First we have to create an event table. This is done with special CREATE TABLE
statement:
CREATE EVENT TABLE my_db.logging.my_event_table;
You do not have to specify the table columns because it contains predefined list of columns. For now you can have only one active event table for whole account.
2. Assign the Event table to account
To get event table in use we have to associate it with our account. It’s done with ALTER ACCOUNT
statement, which means it can be done only with ACCOUNTADMIN
role. Plus there is needed to have either OWNERSHIP OR INSERT privileges for the event table.
ALTER ACCOUNT SET EVENT_TABLE = my_db.logging.my_event_table;
3. Start capturing the log events
Now we can enrich the UDF/UDTF or Stored procedure with logging code. Based on your handler language, you can use native logging APIs and libraries.
Language | Logging Library |
---|---|
Java | SLF4J API |
JavaScript | Snowflake JavaScript API |
Python | logging module |
Scala | SLF4J API |
Snowflake scripting | Snowflake SYSTEM$LOG function |
Let’s take a Python as an example.
CREATE OR REPLACE FUNCTION my_UDF()
RETURNS VARCHAR
RUNTIME_VERSION = 3.8
HANDLER = 'run'
AS $$
import logging
logger = logging.getLogger("my_logger")
def run():
logger.info("Processing start")
...
...
logger.error("Some error in your code")
return value
To start logging we have to import the logging module and instantiate the logger object. Then we can start using it in the same way like any standard Python app and log different levels like INFO
, WARNING
or ERROR
.
Let’s take one more example for SQL scripting and how to add logging in case you have your handler code in SQL. In case of Snowflake Scripting we must use SYSTEM$LOG
function. It also supports different levels of log messages like info
, warning
or error
.
We have simple stored procedure returning a table. In case we want to enrich it with logging info messages we can do following:
create or replace procedure returning_table()
returns table(id number, name varchar)
language sql
as
declare
result RESULTSET DEFAULT (SELECT 1 id, 'test value' name);
begin
SYSTEM$LOG('info', 'Returning a table');
return table(result);
end;
4. Querying the event table
We have added the logging into our handler code, now we want to check the logged events. It’s time for querying the event table. Each logged message contains:
- Timestamp - when was the event logged
- Scope - e.g. name of the class where the log event was created
- severity level of the log - e.g. info, warning, error
- log message itself
For complete list of event table columns, refer to the documentation. Some of the columns are key-value pairs to store multiple attributes. You can extract them with similar queries:
select
resource_attributes:"snow.database.id"::number,
resource_attributes:"snow.database.name"::varchar,
resource_attributes:"snow.executable.name"::varchar,
resource_attributes:"snow.executable.type"::varchar,
resource_attributes:"snow.owner.name"::varchar,
resource_attributes:"snow.query.id"::varchar,
resource_attributes:"snow.warehouse.name"::varchar,
resource_attributes:"telemetry.sdk.language"::varchar,
record,
value
from my_event_table;
And here is how the event table output looks:
Using traces to log structured data
Another use case for event tables is collecting trace data from your code. Trace data is structured logging information in form of key-value pairs which can provide more detailed overview of code’s behaviour than log data usually provides. Let’s go through example where we will start collecting trace data from UDF written in Python:
CREATE OR REPLACE FUNCTION trace_data_logging()
RETURNS VARCHAR
LANGUAGE PYTHON
PACKAGES = ('snowflake-telemetry-python')
RUNTIME_VERSION = 3.8
HANDLER = 'run'
AS $$
import logging
logger = logging.getLogger("tutorial_logger")
from snowflake import telemetry
def run():
telemetry.set_span_attribute("my_function.run", "begin")
telemetry.add_event("event_with_attributes", {"my_function.my_key1": "my_value1", "my_example.my_key2": "my_value2"})
logger.info("Logging in my function.")
return "SUCCESS"
$$;
We have to import the snowflake-telemetry-package
which contains the required methods.
We can use set_span_attribute
method to set key-value pairs to span object. Span objects hold telemetry data created after the function or procedure is executed successfully. It represents the execution unit of the UDF or stored procedure. You can add multiple events to that execution unit with add_event
method.
If you want to know more how Snowflake represents the trace events, refer to their documentation.
Event Table for Native Applications
You can use the event table for collecting logging events and telemetry data in your native apps. It requires additional configuration as the native app code runs in consumer account where events are collected. It requires configuration on both ends - provider and consumer accounts.
Provider setup in a nutshell:
- Configure log and event level in manifest file
- Configure an account to store events
Consumer setup in a nutshell:
- Set up an event table
- Review the events in the event table
- Enable logging and sharing the events with provider
Both consumer and provider have access the event table and logged entries. As a result, it’s important for the consumer to review what kind of information is logged and shared with the provider before enabling it.
Snowflake provides detailed overview with step by step instructions for both providers and consumers. If you want to learn more about overall setup, check the documentation.
Alerts based on events
Event tables can be easily combined with Snowflake Alerts to automate notifications based on logged events. Let’s try to create an alert which will be running once per hour and in case there will be a logged error it will send en email.
CREATE OR REPLACE ALERT alert_logged_errors
WAREHOUSE = my_warehouse
SCHEDULE = '60 MINUTE'
IF (EXISTS (
SELECT *
FROM my_event_table
WHERE record:"severity_text"::VARCHAR == 'ERROR' and timestamp BETWEEN SNOWFLAKE.ALERT.LAST_SUCCESSFUL_SCHEDULED_TIME()
AND SNOWFLAKE.ALERT.SCHEDULED_TIME()
))
THEN CALL SYSTEM$SEND_EMAIL(...);
Event Table Limitations & Gotchas
There is a limit for log and trace payload size. It can’t be over 1 MB.
Event table is available in ACCOUNT_USAGE
view with list of all tables in your account. It has EVENT TABLE as a value for TABLE_TYPE
column.
select *
from snowflake.account_usage.tables
where table_name = 'MY_EVENT_TABLE'
;
Event table can’t be updated. If you try to run update statement, you will get an error saying that UPDATE statement's target must be a table
. Supported operations on EVENT TABLES are:
- SHOW EVENT TABLE
- DESCRIBE EVENT TABLE
- DROP TABLE
- UNDROP TABLE
- TRUNCATE TABLE
- DELETE
- ALTER TABLE
Event Table Pricing and Billing
Collecting the log events is billed as a Serverless feature. Snowflake uses Snowflake managed resources to collect events, meaning they don’t require one of your virtual warehouses. If you want to check how much you have been charged for this feature, you can query the EVENT_USAGE_HISTORY
view:
select
start_time,
end_time,
credits_used,
bytes_ingested
from snowflake.account_usage.EVENT_USAGE_HISTORY
order by start_time desc;