Snowflake JSON Data Parsing Guide | 2024

Date: Sunday, January 28, 2024

Tomáš Sobotík
Senior Data Engineer & Snowflake SME at Norlys
Ian Whitestone
Co-founder & CEO of SELECT

In a previous blog post, we explored the basics of semi-structured data and the support Snowflake provides for it. Additionally, we discussed various methods for ingesting semi-structured data into Snowflake. Now that we have this data in Snowflake, what's the next step? In this blog post, we will focus on how to work effectively with JSON data specifically. We'll delve into the transformation capabilities and the specific functions that Snowflake offers for this purpose. We'll do a quick recap on how to load JSON data into Snowflake.

Snowflake JSON Parsing Functions

Snowflake provides a bunch of different functions for working with semi-structured JSON data. Three of the most common ones you'll use are:

PARSE_JSON: takes a string input as a JSON document and outputs a VARIANT value
TRY_PARSE_JSON: this works in the exact same way as PARSE_JSON, but produces a null value if there is an error during parsing
LATERAL FLATTEN: used to flatten an array into multiple rows - one per object in the array.

We'll cover each of these in more detail below. But first, we need some sample data to work with!

Creating Sample JSON Data

Throughout the blog post, we're going to work with the following sample JSON data containing information about a person:

1{
2   "person":{
3      "name":"John Doe",
4      "address":{
5         "city":"London",
6         "street":"Oxford Street"
7      },
8      "phone":[
9         "Apple iPhone",
10         "Google Pixel",
11         "Samsung Galaxy"
12      ]
13   }
14}

We'll load this data into a table called sample_json_data, and use it for testing all the different features. The table will hold the JSON document in a single VARIANT column called src:

1create table sample_json_data
2    (src variant)
3;

How to use PARSE_JSON

In order to convert a JSON document string into a VARIANT, we need to use the PARSE_JSON function.

1select parse_json('{
2   "person":{
3      "name":"John Doe",
4      "address":{
5         "city":"London",
6         "street":"Oxford Street"
7      },
8      "phone":[
9         "Apple iPhone",
10         "Google Pixel",
11         "Samsung Galaxy"
12      ]
13   }
14}') as output_json_variant_column

Here's the output:

Now we can use this function to populate our sample_json_data

1insert into sample_json_data 
2select parse_json('{
3   "person":{
4      "name":"John Doe",
5      "address":{
6         "city":"London",
7         "street":"Oxford Street"
8      },
9      "phone":[
10         "Apple iPhone",
11         "Google Pixel",
12         "Samsung Galaxy"
13      ]
14   }
15}');

How to use TRY_PARSE_JSON

TRY_PARSE_JSON works in the exact same way as PARSE_JSON:

Snowflake TRY_PARSE_JSON output example 1

However, unlike PARSE_JSON, it will return null if it fails to parse the JSON. If you look at line 9, I've removed the comma!

Snowflake TRY_PARSE_JSON output example 2

If we run PARSE_JSON on this bad data, it will fail with an error:

Notation for Querying a JSON Column

Now, let's begin accessing data within this semi-structured data type. To retrieve data from a top-level key, we can use the colon notation, like src:person. If you need to delve deeper into the object structure, you have two options. You can either use dot notation with a statement like src:person.address.city, or you can use bracket notation with a statement such as src['person']['address']['city'].

We can summarize these methods with the following general statements:

Dot notation: <column>:<level1_element>.<level2_element>.<level3_element>
Bracket notation: <column>['<level1_element>']['<level2_element>']['<level3_element>']

If some key values do not adhere to Snowflake SQL identifier rules — for example, if you have keys containing spaces — you must enclose the key name in double quotes when using dot notation. With bracket notation, you should always enclose values in single quotes.

How to Use LATERAL FLATTEN to Process Arrays

In the previous section, we discussed the basics of navigating a JSON document using dot and bracket notation. However, what happens if your document contains arrays? Using the same logic, selecting src:person:phone would return the entire array as a single value, like [ "Apple iPhone", "Google Pixel", "Samsung Galaxy" ].

To iterate through the array and expand individual values, we need to flatten it. This will result in three individual rows, each containing a single value from the array.

Snowflake offers two functions for this purpose: LATERAL and FLATTEN, which are often used together.

The LATERAL join allows an inline view to reference columns from a preceding table expression. Unlike a non-lateral join, the output from a lateral join includes only the rows generated from the inline view. It's worth noting that, in my experience, I have never used a LATERAL join without the FLATTEN function, although it's technically possible.

The FLATTEN function explodes compound values into multiple rows. It takes a semi-structured data type (VARIANT, OBJECT, or ARRAY) and produces a lateral view. Let's illustrate this with an example of how to expand our array of mobile phones into multiple rows using these two functions:

1SELECT 
2  p.*
3FROM sample_json_data,
4LATERAL FLATTEN(input => src:person.phone) p

This query produces the following result:

The output of the FLATTEN function includes several columns:

SEQ: A unique sequence number associated with the input record; note that the sequence may not be ordered or gap-free.
KEY: For maps or objects, this column contains the key of the exploded value.
PATH: The path to the element within a data structure that is being flattened.
INDEX: The index of the element if it is part of an array; otherwise NULL.
VALUE: The value of the element in the flattened array or object.
THIS: The specific element being flattened, useful in recursive flattening.

Additionally, columns from the left side of the LATERAL join remain accessible. Therefore, we can modify our query to convert the semi-structured data from the JSON document into a flat structure:

1SELECT 
2 src:person.name::varchar AS person_name,
3 src:person.address.city::varchar AS city,
4 src:person.address.street::varchar AS street,
5 p.value::varchar AS mobile_phone
6FROM sample_json_data,
7LATERAL FLATTEN(input => src:person.phone) p;

Here's the output:

Snowflake LATERAL FLATTEN example with type casting

In this SQL query, explicit casting of the columns is necessary because the output is in variant form, meaning the values are enclosed in double quotes. We need to explicitly cast them to the desired data type.

Working with Multiple Levels of Nested Arrays

If you encounter a scenario with multiple levels of nested arrays, where one array contains another, you can use the LATERAL FLATTEN function on the expanded values from level 1 to further expand array values from the nested level. The pseudocode for such an operation would look like this:

1SELECT
2 lvl_1.value,
3 lvl_2.value
4FROM table
5LATERAL FLATTEN (input => src:topLevelArray) AS lvl_1,
6LATERAL FLATTEN (input => lvl_1.value:innerLevelArray) AS lvl_2;

That's essentially it. By combining dot notation or bracket notation with LATERAL FLATTEN, you can build a strong foundation for working with JSON data stored in Snowflake tables. This approach enables you to transform the data into the format you need efficiently.

Constructing JSON Documents from Relational Data

We've covered how to process JSON data. What about the other way around where we want to turn structured data into JSON data?

Converting relational data in a Snowflake table into JSON format is essential, especially when sharing data with external systems that often require JSON for API payloads. Let's explore how to do this using Snowflake's native functions.

Imagine we have a table named relational_data, created from the output of the LATERAL FLATTEN function. Our goal is to transform this relational data back into a JSON structure. First, we create and populate the relational_data table:

1CREATE TABLE relational_data (
2    person_name VARCHAR,
3    city VARCHAR,
4    street VARCHAR,
5    mobile_phone VARCHAR
6);
7
8INSERT INTO relational_data
9SELECT 
10 src:"person".name::varchar AS person_name,
11 src:"person".address.city::varchar AS city,
12 src:"person".address.street::varchar AS street,
13 p.value::varchar AS mobile_phone
14FROM sample_json_data,
15LATERAL FLATTEN(input => src:person.phone) p;

To construct a JSON object with key-value pairs, we use the OBJECT_CONSTRUCT function. This function takes pairs of arguments, where the first is a key and the second is a value. For nested objects, we use another OBJECT_CONSTRUCT call. To create a basic person structure with a name, the SQL statement would be:

1SELECT OBJECT_CONSTRUCT('person', 
2                        OBJECT_CONSTRUCT('name', person_name)) AS src
3FROM relational_data;

This query produces the following JSON structure:

1{
2  "person": {
3    "name": "John Doe"
4  }
5}

Next, let's include the address object with values for city and street:

1SELECT OBJECT_CONSTRUCT('person', 
2                        OBJECT_CONSTRUCT('name', person_name,
3                                         'address', OBJECT_CONSTRUCT(
4                                                     'city', city,
5                                                     'street', street
6                                                     )
7                                        )
8       ) AS src
9FROM relational_data;

Finally, to add an array of mobile phones, we use the ARRAY_AGG() function, which creates an array from input values. Here's the complete SQL query for the final JSON structure:

1SELECT OBJECT_CONSTRUCT('person', 
2                        OBJECT_CONSTRUCT('name', person_name,
3                                         'address', OBJECT_CONSTRUCT(
4                                                     'city', city,
5                                                     'street', street
6                                                     ),
7                                         'phone', ARRAY_AGG(mobile_phone) WITHIN GROUP (ORDER BY person_name) OVER (PARTITION BY person_name)
8                                        )
9       ) AS src
10FROM relational_data;

This process successfully reconstructs the JSON document:

1{
2  "person": {
3    "address": {
4      "city": "London",
5      "street": "Oxford Street"
6    },
7    "name": "John Doe",
8    "phone": [
9      "Apple iPhone",
10      "Google Pixel",
11      "Samsung Galaxy"
12    ]
13  }
14}

With these steps, we've effectively transformed relational data into a structured JSON document, ready for use in various applications.

How to load JSON data

In the examples above, we have used sample JSON data. Most Snowflake customers won't have this, and will instead need to start by loading JSON data into their account.

To load JSON data file into a Snowflake table, the SQL command would look something like this:

1COPY INTO <table_name>
2   FROM (
3          SELECT
4            $1:person:name::STRING as name,
5            $1:person:address:city::STRING as city,
6            $1:person:address:street::STRING as street,
7            $1:person:phone[0]::STRING as phone1,
8            $1:person:phone[1]::STRING as phone2,
9            $1:person:phone[2]::STRING as phone3
10          FROM <my_stage>/<my_json_file.json>
11        );

This command assumes that:

<table_name> is the name of your destination table in Snowflake.
<my_stage> is the name of your stage where the JSON file is stored.
<my_json_file.json> is the name of your JSON file.

In the SELECT statement:

The $1 notation is used to refer to the first column of the file being loaded, which in this case is the entire JSON blob.
The colon : notation is used to navigate the JSON structure.
The ::STRING cast converts the JSON elements into the appropriate data type, which is STRING in this example. You can adjust the data type according to your specific needs.

This example assumes that the JSON structure is consistent across all records in your JSON file. If the structure varies, you might need additional error handling or parsing logic.

For a deeper dive into the different options for loading semi-structured data (including JSON data) into your Snowflake data warehouse, refer to our previous blog post.

Tips for working with JSON Data

Here are a few other helpful tips to keep in mind when working with JSON data:

Use a free, online JSON formatter like this one to format your data. This makes it easier to visually inspect the structure and understand the data hierarchy.
Build your queries one step at a time and validate the output after each step. For example, don't try and query 4 levels of nested JSON at once. This will make it much easier to catch your mistakes, which will inevitably happen.
Use temporary columns to simplify your operations. For example, you can parse a few levels of JSON at once into one column, then continue processing from that column. You can discard the extra columns later on using Snowflake's exclude command. Here's an example that shows how to remove repetitive processing of the address field. It will only output two columns: city and street:

1with
2data as (
3    select 
4      src:person.address as address,
5      address:city::string as city,
6      address:street::string as street
7    from sample_json_data
8)
9select * exclude(address)
10from data

When creating new datasets from JSON data, try and split out frequently accessed columns into separate columns, rather than leaving them in a single variant column. It will make the end user experience of querying your tables much better
JSON attributes are case sensitive. If I type address.City instead of address:city, it will return null.

Wrapping up

That concludes our guide. You now understand how to parse a JSON document, flatten it, and store it in a relational table in Snowflake. Additionally, we've explored the reverse process — creating JSON from a Snowflake table. Our examples focused on simple data structures, but the principles apply to JSON documents of any complexity or size. It's all about skillfully combining these functions, as demonstrated in our straightforward examples. As you tackle more complex structures, these functions might become part of Common Table Expressions (CTEs), but the underlying concept remains the same.

Happy coding! 🧑‍💻