Snowflake Query Execution: How SQL Parsing, Memory, and Parallel Processing Work

Snowflake Query Execution: How SQL Parsing, Memory, and Parallel Processing Work —

When you run a SQL statement, Snowflake query execution begins by parsing and validating the query, generating an optimized execution plan, and then running it on distributed compute resources. Snowflake query execution relies on cost-based optimization and dynamic task scheduling, using shared memory and multiple threads within a virtual warehouse. Parallel processing occurs as execution threads scan micro-partitions concurrently, while memory is dynamically allocated across operators such as scans, joins, and aggregations to deliver efficient, scalable performance.

1. How Does Snowflake Parse and Execute a SQL Query?

When you submit a SQL query in Snowflake, a lot happens before a single row is returned. It may feel instant, but internally there is a well-defined pipeline that converts your text into a fully distributed execution plan.

Let’s walk through the full lifecycle in a clear, structured way. I’ll use a simple example:

SELECT customer_id, SUM(amount)
FROM orders
WHERE order_date >= '2025-01-01'
GROUP BY customer_id;

We’ll trace what happens from the moment you hit Enter to the moment results come back.

High-Level Stages

When a query is submitted, Snowflake processes it in these main stages:

Parsing
Semantic analysis
Logical plan generation
Optimization
Physical plan creation
Execution planning and task scheduling
Distributed execution
Result construction and return

Now let’s go deeper into each stage.

1. Query Submission

When you execute SQL from:

Snowsight
SnowSQL
JDBC/ODBC
Python connector

The SQL text is sent to the Cloud Services layer, not directly to the warehouse.

Important separation:

Cloud Services → parsing, optimization, metadata
Virtual Warehouse → actual execution

2. Parsing (Syntax Analysis)

This is the first real processing step.

The parser checks:

Is the SQL syntactically valid?
Are keywords used correctly?
Are parentheses balanced?
Is the structure legal?

It does NOT yet check if tables exist.

The parser converts your SQL string into an internal structure called a:

👉 Abstract Syntax Tree (AST)

Think of it as a tree representation of the query.

For our example:

SELECT
- columns: customer_id, SUM(amount)
FROM
- table: orders
WHERE
- condition: order_date >= ‘2025-01-01’
GROUP BY
- customer_id

If there’s a syntax mistake, error happens here.

End result of parsing:
A structured tree representation of your SQL.

3. Semantic Analysis (Validation Stage)

Now Snowflake validates meaning.

It checks:

Does orders table exist?
Does customer_id exist in that table?
Is amount numeric (valid for SUM)?
Do you have access privileges?
Does GROUP BY match selected columns?

It resolves:

Fully qualified table names
Column data types
Schema references
Role permissions

If table does not exist or you lack permission, error happens here.

End result:
A validated query structure with resolved metadata.

4. Logical Plan Generation

Now Snowflake converts the validated query into a logical plan.

Logical plan describes:

What operations must happen
In what order
Independent of hardware

For our query, logical operations are:

Scan orders
Filter order_date >= ‘2025-01-01’
Group by customer_id
Compute SUM(amount)

This plan is still abstract.

It does not yet say:

How many threads
How many tasks
Which CPU
Memory layout

It only defines relational operations.

5. Query Optimization (Cost-Based Optimizer)

This is one of the most important stages.

Snowflake now decides the most efficient way to execute the query.

It uses:

Table statistics
Micro-partition metadata
Data distribution
Cardinality estimates
Predicate selectivity

For example:

Micro-partition pruning

If only 10 percent of partitions contain order_date >= ‘2025-01-01’

Snowflake eliminates 90 percent before scanning.

This decision happens here.

Optimizer decisions include:

Join order (if multiple tables)
Filter pushdown
Aggregation pushdown
Partition pruning
Choosing hash vs merge join
Choosing broadcast vs distributed join
Choosing parallel degree

End result:
An optimized logical execution plan.

6. Physical Plan Creation

Now Snowflake converts logical plan into a physical execution plan.

This stage determines:

How many execution operators
How data is distributed
How tasks are split
How many parallel threads

For example:

Instead of:
Scan → Filter → Aggregate

It becomes:

Parallel scan operators
Local aggregation nodes
Exchange nodes (for data redistribution)
Final aggregation node

This physical plan is what you see in:

👉 Query Profile

Operators like:

TableScan
Filter
HashAggregate
Exchange
Sort

End result:
A distributed execution DAG (Directed Acyclic Graph).

7. Task Scheduling in Warehouse

Now Cloud Services sends execution plan to the Virtual Warehouse.

Warehouse:

Allocates compute threads
Splits work into execution tasks
Distributes tasks across CPUs

Example:

If warehouse has 32 cores:

32 scan tasks may run in parallel
Each task reads different micro-partitions

Tasks are created based on:

Data size
Partition count
Operation type
Available compute

This is where multithreading begins.

8. Distributed Execution

Now execution starts.

Step 1: Scan

Each thread reads assigned micro-partitions.

Step 2: Filter

Predicate applied locally inside each thread.

Step 3: Local Aggregation

Each thread computes partial SUM(amount) per customer_id.

Step 4: Exchange

Data is reshuffled if needed for final grouping.

Step 5: Final Aggregation

Global SUM per customer_id computed.

All this happens in parallel across CPUs.

9. Result Materialization

After final aggregation:

Results are gathered
Sorted if required
Serialized into result set format

If result set is small:
Returned directly to client.

If large:
Stored temporarily in result storage.

Snowflake also caches results for:

👉 Result cache reuse (if identical query runs again)

What Is the Final End Result of Parsing?

If you strictly ask:

What does parsing produce?

Answer:
It produces a validated Abstract Syntax Tree (AST).

But practically, the end product of the whole compilation pipeline is:

👉 A fully optimized distributed physical execution plan ready for parallel execution.

That plan:

Knows exactly which partitions to scan
Knows how data flows between tasks
Knows aggregation strategy
Knows parallel degree
Knows memory allocation strategy

Where You Can See This

You cannot see AST directly.

But you can see:

Logical + physical plan in Query Profile
Operator tree
Execution graph
Data volume between nodes
Skew indicators

Complete Flow Summary

User SQL
→ Parser (syntax check)
→ Semantic Analyzer (object validation)
→ Logical Plan
→ Cost-Based Optimizer
→ Physical Plan
→ Task Scheduler
→ Parallel Execution
→ Result Materialization
→ Client receives rows

Key Architectural Insight

Snowflake separates:

Control Plane (Cloud Services):

Parsing
Optimization
Metadata
Planning

Data Plane (Warehouse):

Execution
Parallel processing
Thread-level work

This separation allows:

Better scaling
Reusable planning
Efficient execution
Multi-cluster support

2. How Snowflake Uses Memory and Threads for Parallel Processing — A Practical Explanation

When people first learn that an X-Small Snowflake warehouse has roughly 8 execution threads and around 16 GB of memory, it’s natural to assume that memory is evenly divided per thread. Following that logic, you might calculate:

16 GB total memory
8 threads
So 2 GB per thread

Then, since a micro-partition is about 16 MB compressed, you might conclude:

2 GB / 16 MB ≈ 125 micro-partitions per thread

At first glance, this looks correct. But this is not how Snowflake’s execution engine actually works.

Why Memory Is Not Divided Per Thread

Snowflake does not statically allocate memory like:

Thread 1 → 2 GB
Thread 2 → 2 GB
Thread 3 → 2 GB

Instead, the warehouse node has a shared memory pool. All execution operators — scans, joins, aggregations, sorting — request memory dynamically from this shared pool.

Threads are execution units (CPU scheduling units), not memory containers.

So parallelism is determined by:

Number of execution threads (vCPUs)
Not by dividing total memory by partition size

Let’s Walk Through a Real Example

Assume:

Table size: 10 GB (compressed)
Average micro-partition size: ~16 MB compressed
Total partitions ≈ 640
After pruning, 160 partitions must be scanned
Warehouse: X-Small
Threads available: ~8

Now execution begins.

What Actually Happens

Snowflake creates a work queue containing 160 scan tasks (one per micro-partition).

At runtime:

8 threads pick 8 partitions
Each thread scans one partition
When a thread finishes, it picks the next available partition
This continues until all 160 are processed

So at any given moment:

Only 8 micro-partitions are actively being processed
The remaining partitions are waiting in the queue

This is dynamic scheduling.

What Happens Inside Each Thread

A thread does not load the entire 16 MB compressed partition into memory and hold it there permanently.

Instead, it:

Reads required column segments
Decompresses small chunks
Processes data in vectorized batches
Applies filter conditions
Emits results
Releases memory buffers
Moves to the next partition

Memory is reused continuously.

This streaming model keeps memory efficient and avoids spikes.

Why Snowflake Does Not Process 125 Partitions Per Thread

Even if 2 GB were theoretically available per thread, processing 125 partitions simultaneously would:

Increase memory pressure
Cause CPU cache inefficiency
Add context switching overhead
Provide no CPU benefit (still only 8 threads)

Since CPU cores determine true parallelism, Snowflake keeps:

Parallel tasks = number of execution threads

Not = total memory divided by partition size

What Actually Limits Parallelism

Parallel execution depends on:

Number of vCPUs (threads)
I/O bandwidth
Query complexity
Data distribution

Memory supports execution but does not define how many partitions are processed simultaneously.

Warehouse Size vs Threads and Memory

Snowflake does not publish exact hardware specs, but scaling is predictable. Each size roughly doubles compute and memory.

Here is an approximate model:

Warehouse Size	Compute Nodes	Approx Threads (vCPUs)	Approx Memory Per Warehouse
X-Small	1	~8	~16 GB
Small	2	~16	~32 GB
Medium	4	~32	~64 GB
Large	8	~64	~128 GB
X-Large	16	~128	~256 GB

Each size doubles:

Compute nodes
Total threads
Total memory
Network bandwidth

If you move from X-Small (8 threads) to Medium (32 threads), your 160 partitions would be processed in:

X-Small → 160 / 8 = 20 scheduling waves
Medium → 160 / 32 = 5 waves

That’s why scan-heavy queries scale almost linearly with warehouse size.

The Correct Mental Model

Incorrect model:

Memory per thread determines how many partitions can be processed simultaneously.

Correct model:

Threads determine concurrency.
Micro-partitions determine total work units.
Memory is shared and dynamically allocated.

Snowflake’s design focuses on:

Small micro-partitions
Dynamic scheduling
Streaming execution
Vectorized processing
Shared memory management

This is what allows efficient parallelism without manual tuning.