All posts

Snowflake Summit 2024: All 28 announcements and my takeaways

Date
  • Ian Whitestone
    Co-founder & CEO of SELECT

In 2022, Snowflake made major announcements around their new Native Application framework and hybrid tables (initially announced as Unistore). Last year, they kept the bar high by announcing more new features like Snowpark Container Services and adding Iceberg support.

Having attended the past two summits, Niall and I have been consistently blown away by the pace of development at Snowflake. After Summit 2024 wrapped up, this feeling remains unchanged.

Let’s dive in.

Snowflake Summit 2024

Announcements Summary

If you’re short on time, here’s a full list of everything they announced along with a brief explainer.

Data Engineering

  • They are open-sourcing Polaris Catalog in 90 days, which will allow customers to self-host their Iceberg catalogs and provide greater interoperability between various computing engines.
  • Serverless Tasks Flex, giving customers a way to save up to 42% by allowing for more flexibility for when their tasks are run (Private Preview).
  • Event-driven tasks are now in public preview, allowing you to automatically trigger a task run whenever a Snowflake stream updates.
  • Low Latency Tasks, now in private preview, allowing for reduced task-scheduling intervals, down to 15 seconds.
  • Iceberg Tables, an open source table format which allows customers to store their data in cloud storage as parquet data files, while still being able to query them from Snowflake, is now in General Availability (GA).
  • Dynamic Tables are now in GA, providing a declarative approach to creating incremental transformations and simple data pipelines using SQL SELECT statements.
  • Data Quality Monitoring: Snowflake now provides out-of-the-box system metrics (such as null count) or custom metrics for data that customers can define and automatically measure on tables. This feature is in public preview and only available on Enterprise or higher editions of Snowflake.
  • Native connectors for Postgres and MySQL will be public preview soon. This will allow customers to easily replicate CDC data from their internal company databases to Snowflake with very low latency, and most importantly, without paying a third-party company (like Fivetran) for every row you load.

Analyst & Developer Tooling

  • Snowflake Notebooks are now in public preview, allowing for an end-to-end interactive UI environment for data & AI teams.
  • Snowflake announced a new suite of observability features with the launch of Snowflake Trail, giving developers and partners more tools to monitor and debug their data pipelines.
  • Snowpark Container Services (SPCS) is now in GA, allowing Snowflake users to securely run any workload in Snowflake via a container.
  • As part of this, Snowflake announced public preview support for SPCS with the native apps framework, allowing partners to build and deploy rich applications.
  • Snowflake’s own CoPilot will soon be in GA, allowing analysts and business users to automatically generate SQL queries from written text directly in the Snowflake UI.
  • Snowflake’s Git Integration is now in Public Preview, which allows users to sync files between Github/Gitlab/Bitbucket and Snowflake for better version control and management.

Data Governance & Security

  • A new internal marketplace in private preview, which will enable large organizations and teams to privately share datasets, apps, notebooks and across their company.
  • New Table Governance Views in public preview, showing the top roles and users accessing each table in your Snowflake account.
  • Snowflake’s Trust Center, a UI interface to discover security risks with recommendations to resolve them, will be GA soon.
  • Universal Search is now in GA, allowing users to quickly search and discover all of their Snowflake assets directly in the UI
  • Snowflake showed off a new Data Lineage Visualization interface in the UI, currently in private preview, which will allow users to see upstream and downstream dependencies for all tables and views.
  • AI-Powered Object Descriptions were announced, which automatically add relevant context and comments to tables and views using AI.

Enterprise AI & ML

  • Snowflake released a new Snowpark pandas API which allows Python users to execute their pandas code on Snowflake compute in a parallelized fashion eliminating common pandas limitations around data volumes & performance.
  • Document AI will be GA soon. Document AI provides serverless LLM based document processing for extracting structured data from unstructured business docs (PDF, Images, Word & etc. such as invoices, contracts, product test sheets & etc.). Learn more here.
  • Snowflake Feature Store, now in public preview, allows users to create, manage, and serve ML features with continuous, automated refresh on batch or streaming data using UI or Snowpark ML APIs. Learn more here.
  • Snowflake Model Registry is now in GA, which provides an integrated solution to manage, track, version and share AI/ML models and their metadata natively in Snowflake via UI or Snowpark ML APIs. Learn more here.
  • Snowsight AI & ML Studio is now in GA, providing a UI which allows users to easily create various ML & AI models & pipelines.
  • Cortex Finetuning is now in Public Preview, which allows Foundational LLM models to be trained directly from UI.
  • Cortex Analyst, currently in private preview, is a serverless & highly accurate LLM service that allows business users to ask a question and receive an actual answer, not just SQL text. Analyst will be powered by a YAML file which defines things like the data schema, metrics, and synonyms to control the scope and accuracy.
  • Cortex Search, also in private preview, is a serverless LLM-based hybrid (Vector + Keywords) search service that incrementally extracts data from docs, tables & views, automatically chunks the data & vectorizes it for fast (< 200ms ) queries & highly accurate search performance.

Quite a lot to digest! For the rest of the post, I'll dive into what was discussed in terms of Snowflake's direction and go deeper into some of the bigger announcements, along with their impact.

Opening Remarks & Snowflake’s Philosophy

In the opening remarks of the summit keynote, Sridhar and Benoit opened up by reiterating the core principles of the Snowflake platform. They emphasized how Snowflake is one platform built on top of one engine, that just works. They keep it simple, and don’t give you 20 different ways to do the same thing. This point didn’t fully resonate until I attended Databricks Summit the week after, and learned just how complex their product is.

As Snowflake recently rebranded from the “Data Cloud” to the “AI Data Cloud”, the opening remarks were largely oriented on how the platform provides everything you need for Enterprise AI:

  1. Data. Snowflake provides highly performant native storage and supports the open Iceberg format for customers who want to retain full control over their data and make it accessible to other compute engines (i.e. Spark or Trino).
  2. Compute. Everyone is familiar with Snowflake’s virtual warehouses, primarily used for SQL queries. Internally, Snowflake refers to its virtual warehouse technology as a "data flow engine". All languages (Java, Python, and Scala) push down to this engine, not just SQL. With Snowpark Container Services, Snowflake is now offering an even more flexible compute option, which allows users to run any application or workload in a secure container (think: if it can run in Docker, it can run in Snowflake).
  3. AI, the big topic of the day. With Cortex, Snowflake customers have access to a suite of tools to make using and building AI applications simple. They also discussed Arctic, a new family of LLMs designed by Snowflake, specifically targeted towards enterprise use cases.
  4. Security, governance, and collaboration. Snowflake has long been known for its strong security guarantees, ease of governance, and robust access control. While it’s not something the typical user thinks about, these features and guarantees are table stakes for every company.

Polaris Catalog and More Interoperability Across the Industry

One of the most talked about announcements came days before Summit started when Snowflake announced they would be open-sourcing their Iceberg Catalog, Polaris. A big reason this is significant is it will help increase usability and provide more interoperability of Iceberg data lakes across cloud platforms.

Snowflake polaris catalog

Another reason this was widely discussed is because Databricks simultaneously announced they were acquiring Tabular, a company started by the creators and most key people behind the development of Iceberg.

This clearly means that Databricks will be adding first-class support for Iceberg in their product, and I imagine we will see Snowflake follow suit at some point by adding support for Delta Lake. While it’s a bit unsettling to know that one company will effectively “own” the two most popular open-source data formats, I think in the end it will ultimately benefit customers significantly as all cloud data platforms will be forced to offer support for more open data formats.

As part of this, Snowflake also announced a major expansion of their partnership with Microsoft Fabric. At the heart of Fabric is OneLake, which historically was delta-lake parquet files. Now, Microsoft will support Iceberg and Snowflake will integrate with Onelake.

For those who are curious why you need a separate catalog for Iceberg, I found this snippet from the Iceberg documentation helpful:

You may think of Iceberg as a format for managing data in a single table, but the Iceberg library needs a way to keep track of those tables by name. Tasks like creating, dropping, and renaming tables are the responsibility of a catalog. Catalogs manage a collection of tables that are usually grouped into namespaces. The most important responsibility of a catalog is tracking a table's current metadata, which is provided by the catalog when you load a table.

The Iceberg project didn’t provide a catalog itself, instead, they set the standards for what catalogs would do and how they could interact with Iceberg. In true Snowflake fashion, they are drastically lowering the barrier to adopting complex technologies like Iceberg by providing the missing pieces.

Serverless Tasks Flex in Private Preview

For those interested in Snowflake cost optimization, Snowflake announced an exciting new feature called Serverless Tasks Flex.

With Serverless Tasks, you define a schedule and Snowflake will automatically run your query on compute resources they manage. You only pay per second for the exact compute time used.

With Serverless Tasks Flex, you will provide both a schedule and a SLA that defines the maximum runtime of your task (i.e. what time does it need to finish by, relative to the scheduled start time). Snowflake will take these two inputs, and find the cheapest time to run your task while ensuring it finishes within the SLA (i.e. 3 hours) you specify.

Snowflake serverless tasks flex

Snowpark Container Services in Native Apps are now in Public Preview

Snowflake announced that native app developers can now leverage Snowpark Container Services in their applications, allowing for more rich UIs (i.e. custom React/Javascript) and complex applications to be built and run entirely in Snowflake.

This was a huge announcement for Snowflake partners, as it will allow companies like SELECT to provide a more secure and seamless offering of their products.

Snowflake native app announcements Summit 2024

Snowflake Notebooks in Public Preview

Snowflake announced that their Notebooks feature is now in Public Preview. Notebooks allow you to blend Python, SQL and Markdown for creating reports, jobs or performing adhoc analysis.

You can also schedule notebooks to run on a virtual warehouse or Snowpark container. Snowflake also announced they will offer in-line CoPilot for notebooks (currently in public preview).

Snowflake notebooks

Snowflake Horizon Announcements

Snowflake Horizon is Snowflake’s suite of features related to data governance, discoverability, security and privacy. There were a variety of announcements related to Horizon:

  • Universal Search
  • Table Governance Views
  • AI-Powered Object Descriptions
  • Data Lineage Visualization for Tables & Views
  • ML Lineage Visualization
  • Trust Center
  • Internal Marketplace

I’ll touch on a few of these that I think will be most significant for most Snowflake users.

Universal Search is now Generally Available

Search, accessible via the UI, allows you to search across everything in Snowflake from your internal data to the marketplace. Search is powered by Neeva's search engine tech that Snowflake acquired in 2023.

Here’s an example from our Snowflake account:

Snowflake universal search example

Table Governance Tab in Public Preview

On the Table’s page, Snowflake showed off a new Governance tab which contains information about the top queries, users and roles accessing a table. This will be helpful for users looking to understand who is using a table and how.

Snowflake table governance tab

Table Lineage in Private Preview

Snowflake showed off a new Table Lineage view on the Table’s page, which shows upstream and downstream dependencies for Views and Tables.

This is one of the most popular features offered by Data Catalog and Observability tools, so it’s awesome to see Snowflake making this available to all customers.

Snowflake data lineage

Various Performance Improvements

For those interested in query optimization, Snowflake highlighted a number of different performance improvements they’ve released over the last 12 months. All of these happen behind the scenes and silently improve the developer and user experience for all Snowflake customers.

Snowflake 2024 performance improvements

Native connectors for Postgres and MySQL

Based on the success of their recent Snowflake native connectors — Snowflake Connector for Kafka and connectors for SaaS applications, like ServiceNow and Google Analytics, Snowflake announced that that Native connectors for Postgres and MySQL will be public preview soon. This will allow customers to easily replicate CDC data from their internal company databases to Snowflake with very low latency, and most importantly, without paying a third-party company for every row you load.

It will be interesting to see what this means for companies like Fivetran. If I were to guess, more than half of their revenue probably comes from Postgres & MySQL data replication. Snowflake has stated they will be building more connectors in the future for other leading databases, so the revenue cannibalization can only go in one direction: up.

Snowflake Trail in Private Preview

Snowflake is continuing to make deeper investments in observability features for both data and application development.

This year at Summit they announced Snowflake Trail, a set of Snowflake capabilities for developers to better monitor, troubleshoot, debug and take actions on pipelines, apps, user code and compute utilization.

Snowflake Trail is powered by Event Tables, and helps users automatically gain better visibility into the performance of Snowpark code and resource usage.

Snowflake trail

Snowflake trail is compliant with the open telemetry spec, so it can be easily integrated with partner technologies. They will also have a basic UI on top of the event tables.

Snowpark Pandas API

In late 2023, Snowflake acquired Ponder to boost their Python capabilities.

As a result of this acquisition, Snowflake released a new Snowpark pandas API which allows Python users to execute their pandas code on Snowflake compute in a parallelized fashion eliminating common pandas limitations around data volumes & performance.

Users can now write regular pandas code and have it automatically execute under the hood as Snowflake SQL, giving it much better performance and the ability to handle larger-than-memory datasets.

Snowflake pandas API

Under the hood, this works by taking your pandas code and translating it into SQL queries which then get returned back as native Python objects. Very cool!

Snowflake pandas API architecture

Pictures from the Snowflake announcement: https://www.snowflake.com/blog/snowpark-pandas-api-run-at-scale/

Snowflake Cortex

Snowflake Cortex is Snowflake’s suite of AI tooling and features. Cortex gives you access to a variety of different LLMs, including those from Mistral, Reka, Meta, Google and Snowflake Arctic.

At Summit, Snowflake announced 3 new features as part of the Cortex suite.

Cortex Finetuning (Public Preview)

Cortex Finetuning is now in Public Preview, which allows Foundational LLM models to be trained directly from UI.

Most off-the-shelf LLMs are not suitable for most companies to use as they don’t know of any of their internal data or systems. With fine-tuning, companies can train more relevant models for their internal use cases.

Cortex Analyst (Private Preview)

Cortex Analyst, currently in private preview, is a serverless & highly accurate LLM service, which allows business users to ask a question and receive an actual answer, not just SQL text.

This is one of the top use cases for AI that most people are discussing in the industry. If you arm an LLM with all the semantic knowledge and metadata of your data warehouse, and give it the ability to directly run SQL queries and interpret the results, you have something very valuable.

Cortex Analyst will be powered by a YAML file which defines things like the data schema, metrics, and synonyms to control the scope and accuracy. We’re quite excited to see how this feature unfolds. Given the complexity associated with building such a system and the business implications of getting answers, I expect it will be some time before this is handling meaningful volumes of real data inquiries from the business.

Cortex Search (Private Preview)

Cortex Search, also in private preview, is a serverless LLM-based hybrid (Vector + Keywords) search service, which incrementally extracts data from docs, tables & views, automatically chunks the data & vectorizes it for fast (< 200ms ) queries & highly accurate search performance.

During the Summit keynote, they got a random audience member to come up on stage. The person had only logged into Snowflake a handful of times and was able to get a working service in under 5 minutes just by clicking around in the UI.

During the demo, they selected a stage, which was preloaded with a bunch of example PDFs and then in a few minutes were presented with a chat interface where they could ask questions and the LLM would respond with answers based directly on the contents of those PDFs.

Dark Mode!!

To close out the keynote, Snowflake announced that one of the most requested features, Dark Mode, was now available to all customers. You can watch the video here.

I’ve been using it ever since, and am quite pleased with the look and feel!

Snowflake dark mode

Closing Thoughts

While there weren’t any foundational platform announcements as big as the past years like Native Apps or Snowpark Container Services, Summit 2024 did make a few things clear:

  1. The rate at which new features are being released and made generally available has significantly accelerated. Even though we didn't see any single net new announcement as big as Snowflake launching an app store or a foundational new way to run workloads (Snowpark Container Services), it felt like we got a whole lot more across the board.
  2. Snowflake is giving their customers more functionality out of the box and reducing the need to go out and buy other 3rd party tools typically required to operate a data warehouse. They've deepened their investments in data quality and data governance, all customers will soon get data lineage (the most popular feature of data catalogs), they have a killer new Notebooks feature, and they are about to release free native connectors for Postgres and MySQL (bye Fivetran?).
  3. AI & ML is the top focus area for Snowflake. This has also become increasingly clear in investor calls, as the market is expecting these new workloads (along with Snowpark) to make up a meaningful portion of revenue in the next 5 years. Right now, the bulk of Snowflake’s revenue (I’d guess over >90%) comes from traditional data warehousing and business intelligence workloads. There were lots of impressive features demoed at Summit. If these live up to what was shown on stage, there is no doubt they will have a massive impact across many organizations in terms of accelerating AI adoption.
Ian Whitestone
Co-founder & CEO of SELECT
Ian is the Co-founder & CEO of SELECT, a SaaS Snowflake cost management and optimization platform. Prior to starting SELECT, Ian spent 6 years leading full stack data science & engineering teams at Shopify and Capital One. At Shopify, Ian led the efforts to optimize their data warehouse and increase cost observability.

Get up and running with SELECT in 15 minutes.

Snowflake optimization & cost management platform

Gain visibility into Snowflake usage, optimize performance and automate savings with the click of a button.

SELECT web application screenshot

Want to hear about our latest Snowflake learnings? 
Subscribe to get notified.