Jeff SkoldbergFriday, October 10, 2025
Snowflake Document AI transforms unstructured documents into queryable structured data using AI, processing invoices, contracts, and forms directly within Snowflake. The service uses the proprietary Arctic-TILT model to extract text, tables, and entities from PDFs and images with 90% ANLS benchmark accuracy, outperforming GPT-4. Arctic-TILT runs on a single GPU providing cost efficiency. With pricing starting at roughly $0.012–$0.018 per page on Standard Edition, Document AI tackles the fact that 80–90% of enterprise data is unstructured. It can extract information from new document types without prior training (zero-shot) and also supports fine-tuning with as few as 5–20 examples.
Document AI transforms unstructured documents into queryable structured data through a two-phase workflow: model building and inference.
Create models through the Snowsight UI at AI & ML → Document AI. You'll need the SNOWFLAKE.DOCUMENT_INTELLIGENCE_CREATOR
database role:
Upload 10-20 sample documents representing your document types. Define what you want to extract using natural language questions like "What is the invoice number?", "What is the vendor name?", or "What line items are listed?". Review the extracted values, correct any errors, train the model (optional but recommended for 20+ documents), and publish for production use.
The model learns extraction patterns from your corrections. You don't need ML expertise - just answer questions about where data appears in your documents.
Create a warehouse, database, schema, and stage for document processing:
Upload documents to your stage using PUT commands or external stage integrations to S3, Azure Blob, or GCS.
Extract data using the model!PREDICT method:
The output returns JSON with extracted values and confidence scores:
Transform JSON output into usable columns:
There’s a lot more we could do here such as flattening results into multiple rows and automating with Streams and Tasks, but we want to keep the example section short.
Snowflake charges Document AI using Snowflake-managed AI Services compute, which automatically scales based on workload. Credits are consumed based on compute time at 8 credits per hour (as of September 2025), with consumption influenced by page count, document density, and number of values extracted. This is fundamentally different from token-based pricing used for other Cortex LLM features.
Costs scale with document density and extraction complexity. Here are the official ranges from Snowflake's pricing documentation:
Low density documents (invoices, slides) - 100 documents with 10 pages each:
High density documents (research papers, legal contracts):
Table extraction (preview feature) - 1,000 documents with 2-10 pages each:
Purchase order processing example:
Let's say you process 10,000 purchase orders annually. Each PO is a single page with low density (typical PO format). You want to extract Customer Name, Customer Address, PO Number, PO Date, Line Items (table), and Total Amount. That's 10,000 total pages with approximately 10 values extracted per document. Using the low density estimate for 1,000 documents with 1 page each, you'd consume 9-12 credits per 1,000 pages, resulting in 90 to 120 credits annually.
Using $3 per credit which is common for Enterprise Edition, the cost can be estimated as $270 to $360 per year. The actual cost would depend on the amount of compute used.
This example shows how Document AI can process thousands of documents annually for a few hundred dollars, making it cost-effective for automating manual data entry that would otherwise require significant staff time.
Virtual Warehouse Cost
Document AI requires a Virtual warehouse to kick off the document AI job. The AI does not run on one of your Warehouses, but the job that starts the process does. We recommend to always use and x-small for this, as the size of the warehouse will have no bearing on the speed at which the AI tasks are completed.
Storage Cost
Storage cost for documents in stages and result tables are billed at your standard storage rate. Example, $23 / TB / Month is common for Enterprise Edition customers.
Snowflake provides dedicated views in ACCOUNT_USAGE
for tracking Document AI consumption with ~2 hour latency and 365-day retention.
The DOCUMENT_AI_USAGE_HISTORY view contains per-query credit consumption:
Aggregate usage by day to understand spending trends:
Track which users and roles consume Document AI credits:
Identify which warehouses process Document AI queries.
It may be a good idea to modify the query above to search for warehouses that are not x-small and notify when that happens.
Those monitoring queries above are worthless if nobody runs them. For anything you want to monitor in Snowflake, you can wrap the SQL in a scheduled task with a Notification Integration to create a custom monitor that sends alerts to Slack or Teams. If you want extremely easy to use monitoring functionality, check out monitors in SELECT.
Document AI runs on Snowflake-managed serverless compute. Your warehouse only executes the SQL query wrapping the PREDICT method. Larger warehouses provide zero speed improvement for Document AI processing while increasing costs from 1 credit/hour (X-Small) to 4+ credits/hour (Medium+).
Each additional extracted field increases processing time and cost. Extracting 10 values consumes 4-7 credits per 1,000 pages (medium density) while extracting 40 values consumes 16-30 credits (2-3x increase).
Design extraction models targeting only fields required for downstream applications. Don't extract "everything that might be useful someday." Extract what you need now. You can always create a new model build with additional fields later.
Processing efficiency improves when documents are grouped by type. Use a classification model first to identify document types, then route to specialized extraction models:
This approach also improves accuracy since each model specializes in one document format.
Document AI provides confidence scores for every extracted value. Don't blindly trust all extractions. Implement quality controls:
Set thresholds based on your risk tolerance. Financial data might require 0.95+ confidence while internal documents might accept 0.75+. Test thresholds with sample documents to find the right balance between automation and accuracy.
Document AI makes it affordable to automate document processing at scale, typically just a few cents for non dense documents. The key to controlling costs is understanding what drives them: page count, document density, and number of extracted values all impact compute time and credit consumption. Use small warehouses (X-Small or Small) when running the PREDICT method, extract only the fields you actually need, and implement confidence thresholds for quality control. More importantly, set up alerts on those monitoring queries above so you know when costs spike rather than discovering it in your monthly bill. Start with a small batch of documents, monitor your consumption patterns closely, and scale up as you understand your actual costs.
Let us know how you’re using document AI! We’d love to hear more real world stories about how customers are using it, and your opinion on the overall effectiveness.
Jeff Skoldberg is a Sales Engineer at SELECT, helping customers get maximum value out of the SELECT app to reduce their Snowflake spend. Prior to joining SELECT, Jeff was a Data and Analytics Consultant with 15+ years experience in automating insights and using data to control business processes. From a technology standpoint, he specializes in Snowflake + dbt + Tableau. From a business topic standpoint, he has experience in Public Utility, Clinical Trials, Publishing, CPG, and Manufacturing.
Want to hear about our latest Snowflake learnings?Subscribe to get notified.
Connect your Snowflake account and instantly understand your savings potential.