Snowflake Query Execution: How SQL Parsing, Memory, and Parallel Processing Work —
When you run a SQL statement, Snowflake query execution begins by parsing and validating the query, generating an optimized execution plan, and then running it on distributed compute resources. Snowflake query execution relies on cost-based optimization and dynamic task scheduling, using shared memory and multiple threads within a virtual warehouse. Parallel processing occurs as execution threads scan micro-partitions concurrently, while memory is dynamically allocated across operators such as scans, joins, and aggregations to deliver efficient, scalable performance.
1. How Does Snowflake Parse and Execute a SQL Query?
When you submit a SQL query in Snowflake, a lot happens before a single row is returned. It may feel instant, but internally there is a well-defined pipeline that converts your text into a fully distributed execution plan.
Let’s walk through the full lifecycle in a clear, structured way. I’ll use a simple example:
SELECT customer_id, SUM(amount)
FROM orders
WHERE order_date >= '2025-01-01'
GROUP BY customer_id;
We’ll trace what happens from the moment you hit Enter to the moment results come back.
High-Level Stages
When a query is submitted, Snowflake processes it in these main stages:
- Parsing
- Semantic analysis
- Logical plan generation
- Optimization
- Physical plan creation
- Execution planning and task scheduling
- Distributed execution
- Result construction and return
Now let’s go deeper into each stage.
1. Query Submission
When you execute SQL from:
- Snowsight
- SnowSQL
- JDBC/ODBC
- Python connector
The SQL text is sent to the Cloud Services layer, not directly to the warehouse.
Important separation:
- Cloud Services → parsing, optimization, metadata
- Virtual Warehouse → actual execution
2. Parsing (Syntax Analysis)
This is the first real processing step.
The parser checks:
- Is the SQL syntactically valid?
- Are keywords used correctly?
- Are parentheses balanced?
- Is the structure legal?
It does NOT yet check if tables exist.
The parser converts your SQL string into an internal structure called a:
👉 Abstract Syntax Tree (AST)
Think of it as a tree representation of the query.
For our example:
- SELECT
- columns: customer_id, SUM(amount)
- FROM
- table: orders
- WHERE
- condition: order_date >= ‘2025-01-01’
- GROUP BY
- customer_id
If there’s a syntax mistake, error happens here.
End result of parsing:
A structured tree representation of your SQL.
3. Semantic Analysis (Validation Stage)
Now Snowflake validates meaning.
It checks:
- Does
orderstable exist? - Does
customer_idexist in that table? - Is
amountnumeric (valid for SUM)? - Do you have access privileges?
- Does GROUP BY match selected columns?
It resolves:
- Fully qualified table names
- Column data types
- Schema references
- Role permissions
If table does not exist or you lack permission, error happens here.
End result:
A validated query structure with resolved metadata.
4. Logical Plan Generation
Now Snowflake converts the validated query into a logical plan.
Logical plan describes:
- What operations must happen
- In what order
- Independent of hardware
For our query, logical operations are:
- Scan orders
- Filter order_date >= ‘2025-01-01’
- Group by customer_id
- Compute SUM(amount)
This plan is still abstract.
It does not yet say:
- How many threads
- How many tasks
- Which CPU
- Memory layout
It only defines relational operations.
5. Query Optimization (Cost-Based Optimizer)
This is one of the most important stages.
Snowflake now decides the most efficient way to execute the query.
It uses:
- Table statistics
- Micro-partition metadata
- Data distribution
- Cardinality estimates
- Predicate selectivity
For example:
Micro-partition pruning
If only 10 percent of partitions contain order_date >= ‘2025-01-01’
Snowflake eliminates 90 percent before scanning.
This decision happens here.
Optimizer decisions include:
- Join order (if multiple tables)
- Filter pushdown
- Aggregation pushdown
- Partition pruning
- Choosing hash vs merge join
- Choosing broadcast vs distributed join
- Choosing parallel degree
End result:
An optimized logical execution plan.
6. Physical Plan Creation
Now Snowflake converts logical plan into a physical execution plan.
This stage determines:
- How many execution operators
- How data is distributed
- How tasks are split
- How many parallel threads
For example:
Instead of:
Scan → Filter → Aggregate
It becomes:
- Parallel scan operators
- Local aggregation nodes
- Exchange nodes (for data redistribution)
- Final aggregation node
This physical plan is what you see in:
👉 Query Profile
Operators like:
- TableScan
- Filter
- HashAggregate
- Exchange
- Sort
End result:
A distributed execution DAG (Directed Acyclic Graph).
7. Task Scheduling in Warehouse
Now Cloud Services sends execution plan to the Virtual Warehouse.
Warehouse:
- Allocates compute threads
- Splits work into execution tasks
- Distributes tasks across CPUs
Example:
If warehouse has 32 cores:
- 32 scan tasks may run in parallel
- Each task reads different micro-partitions
Tasks are created based on:
- Data size
- Partition count
- Operation type
- Available compute
This is where multithreading begins.
8. Distributed Execution
Now execution starts.
Step 1: Scan
Each thread reads assigned micro-partitions.
Step 2: Filter
Predicate applied locally inside each thread.
Step 3: Local Aggregation
Each thread computes partial SUM(amount) per customer_id.
Step 4: Exchange
Data is reshuffled if needed for final grouping.
Step 5: Final Aggregation
Global SUM per customer_id computed.
All this happens in parallel across CPUs.
9. Result Materialization
After final aggregation:
- Results are gathered
- Sorted if required
- Serialized into result set format
If result set is small:
Returned directly to client.
If large:
Stored temporarily in result storage.
Snowflake also caches results for:
👉 Result cache reuse (if identical query runs again)
What Is the Final End Result of Parsing?
If you strictly ask:
What does parsing produce?
Answer:
It produces a validated Abstract Syntax Tree (AST).
But practically, the end product of the whole compilation pipeline is:
👉 A fully optimized distributed physical execution plan ready for parallel execution.
That plan:
- Knows exactly which partitions to scan
- Knows how data flows between tasks
- Knows aggregation strategy
- Knows parallel degree
- Knows memory allocation strategy
Where You Can See This
You cannot see AST directly.
But you can see:
- Logical + physical plan in Query Profile
- Operator tree
- Execution graph
- Data volume between nodes
- Skew indicators
Complete Flow Summary
User SQL
→ Parser (syntax check)
→ Semantic Analyzer (object validation)
→ Logical Plan
→ Cost-Based Optimizer
→ Physical Plan
→ Task Scheduler
→ Parallel Execution
→ Result Materialization
→ Client receives rows
Key Architectural Insight
Snowflake separates:
Control Plane (Cloud Services):
- Parsing
- Optimization
- Metadata
- Planning
Data Plane (Warehouse):
- Execution
- Parallel processing
- Thread-level work
This separation allows:
- Better scaling
- Reusable planning
- Efficient execution
- Multi-cluster support
2. How Snowflake Uses Memory and Threads for Parallel Processing — A Practical Explanation
When people first learn that an X-Small Snowflake warehouse has roughly 8 execution threads and around 16 GB of memory, it’s natural to assume that memory is evenly divided per thread. Following that logic, you might calculate:
- 16 GB total memory
- 8 threads
- So 2 GB per thread
Then, since a micro-partition is about 16 MB compressed, you might conclude:
2 GB / 16 MB ≈ 125 micro-partitions per thread
At first glance, this looks correct. But this is not how Snowflake’s execution engine actually works.
Why Memory Is Not Divided Per Thread
Snowflake does not statically allocate memory like:
Thread 1 → 2 GB
Thread 2 → 2 GB
Thread 3 → 2 GB
Instead, the warehouse node has a shared memory pool. All execution operators — scans, joins, aggregations, sorting — request memory dynamically from this shared pool.
Threads are execution units (CPU scheduling units), not memory containers.
So parallelism is determined by:
- Number of execution threads (vCPUs)
- Not by dividing total memory by partition size
Let’s Walk Through a Real Example
Assume:
- Table size: 10 GB (compressed)
- Average micro-partition size: ~16 MB compressed
- Total partitions ≈ 640
- After pruning, 160 partitions must be scanned
- Warehouse: X-Small
- Threads available: ~8
Now execution begins.
What Actually Happens
Snowflake creates a work queue containing 160 scan tasks (one per micro-partition).
At runtime:
- 8 threads pick 8 partitions
- Each thread scans one partition
- When a thread finishes, it picks the next available partition
- This continues until all 160 are processed
So at any given moment:
- Only 8 micro-partitions are actively being processed
- The remaining partitions are waiting in the queue
This is dynamic scheduling.
What Happens Inside Each Thread
A thread does not load the entire 16 MB compressed partition into memory and hold it there permanently.
Instead, it:
- Reads required column segments
- Decompresses small chunks
- Processes data in vectorized batches
- Applies filter conditions
- Emits results
- Releases memory buffers
- Moves to the next partition
Memory is reused continuously.
This streaming model keeps memory efficient and avoids spikes.
Why Snowflake Does Not Process 125 Partitions Per Thread
Even if 2 GB were theoretically available per thread, processing 125 partitions simultaneously would:
- Increase memory pressure
- Cause CPU cache inefficiency
- Add context switching overhead
- Provide no CPU benefit (still only 8 threads)
Since CPU cores determine true parallelism, Snowflake keeps:
Parallel tasks = number of execution threads
Not = total memory divided by partition size
What Actually Limits Parallelism
Parallel execution depends on:
- Number of vCPUs (threads)
- I/O bandwidth
- Query complexity
- Data distribution
Memory supports execution but does not define how many partitions are processed simultaneously.
Warehouse Size vs Threads and Memory
Snowflake does not publish exact hardware specs, but scaling is predictable. Each size roughly doubles compute and memory.
Here is an approximate model:
| Warehouse Size | Compute Nodes | Approx Threads (vCPUs) | Approx Memory Per Warehouse |
|---|---|---|---|
| X-Small | 1 | ~8 | ~16 GB |
| Small | 2 | ~16 | ~32 GB |
| Medium | 4 | ~32 | ~64 GB |
| Large | 8 | ~64 | ~128 GB |
| X-Large | 16 | ~128 | ~256 GB |
Each size doubles:
- Compute nodes
- Total threads
- Total memory
- Network bandwidth
If you move from X-Small (8 threads) to Medium (32 threads), your 160 partitions would be processed in:
- X-Small → 160 / 8 = 20 scheduling waves
- Medium → 160 / 32 = 5 waves
That’s why scan-heavy queries scale almost linearly with warehouse size.
The Correct Mental Model
Incorrect model:
Memory per thread determines how many partitions can be processed simultaneously.
Correct model:
Threads determine concurrency.
Micro-partitions determine total work units.
Memory is shared and dynamically allocated.
Snowflake’s design focuses on:
- Small micro-partitions
- Dynamic scheduling
- Streaming execution
- Vectorized processing
- Shared memory management
This is what allows efficient parallelism without manual tuning.