Snowflake Query Execution: How SQL Parsing, Memory, and Parallel Processing Work

Snowflake Query Execution: How SQL Parsing, Memory, and Parallel Processing Work

When you run a SQL statement, Snowflake query execution begins by parsing and validating the query, generating an optimized execution plan, and then running it on distributed compute resources. Snowflake query execution relies on cost-based optimization and dynamic task scheduling, using shared memory and multiple threads within a virtual warehouse. Parallel processing occurs as execution threads scan micro-partitions concurrently, while memory is dynamically allocated across operators such as scans, joins, and aggregations to deliver efficient, scalable performance.

1. How Does Snowflake Parse and Execute a SQL Query?

When you submit a SQL query in Snowflake, a lot happens before a single row is returned. It may feel instant, but internally there is a well-defined pipeline that converts your text into a fully distributed execution plan.

Let’s walk through the full lifecycle in a clear, structured way. I’ll use a simple example:

SELECT customer_id, SUM(amount)
FROM orders
WHERE order_date >= '2025-01-01'
GROUP BY customer_id;

We’ll trace what happens from the moment you hit Enter to the moment results come back.

High-Level Stages

When a query is submitted, Snowflake processes it in these main stages:

  1. Parsing
  2. Semantic analysis
  3. Logical plan generation
  4. Optimization
  5. Physical plan creation
  6. Execution planning and task scheduling
  7. Distributed execution
  8. Result construction and return

Now let’s go deeper into each stage.

1. Query Submission

When you execute SQL from:

  • Snowsight
  • SnowSQL
  • JDBC/ODBC
  • Python connector

The SQL text is sent to the Cloud Services layer, not directly to the warehouse.

Important separation:

  • Cloud Services → parsing, optimization, metadata
  • Virtual Warehouse → actual execution
2. Parsing (Syntax Analysis)

This is the first real processing step.

The parser checks:

  • Is the SQL syntactically valid?
  • Are keywords used correctly?
  • Are parentheses balanced?
  • Is the structure legal?

It does NOT yet check if tables exist.

The parser converts your SQL string into an internal structure called a:

👉 Abstract Syntax Tree (AST)

Think of it as a tree representation of the query.

For our example:

  • SELECT
    • columns: customer_id, SUM(amount)
  • FROM
    • table: orders
  • WHERE
    • condition: order_date >= ‘2025-01-01’
  • GROUP BY
    • customer_id

If there’s a syntax mistake, error happens here.

End result of parsing:
A structured tree representation of your SQL.

3. Semantic Analysis (Validation Stage)

Now Snowflake validates meaning.

It checks:

  • Does orders table exist?
  • Does customer_id exist in that table?
  • Is amount numeric (valid for SUM)?
  • Do you have access privileges?
  • Does GROUP BY match selected columns?

It resolves:

  • Fully qualified table names
  • Column data types
  • Schema references
  • Role permissions

If table does not exist or you lack permission, error happens here.

End result:
A validated query structure with resolved metadata.

4. Logical Plan Generation

Now Snowflake converts the validated query into a logical plan.

Logical plan describes:

  • What operations must happen
  • In what order
  • Independent of hardware

For our query, logical operations are:

  1. Scan orders
  2. Filter order_date >= ‘2025-01-01’
  3. Group by customer_id
  4. Compute SUM(amount)

This plan is still abstract.

It does not yet say:

  • How many threads
  • How many tasks
  • Which CPU
  • Memory layout

It only defines relational operations.

5. Query Optimization (Cost-Based Optimizer)

This is one of the most important stages.

Snowflake now decides the most efficient way to execute the query.

It uses:

  • Table statistics
  • Micro-partition metadata
  • Data distribution
  • Cardinality estimates
  • Predicate selectivity

For example:

Micro-partition pruning

If only 10 percent of partitions contain order_date >= ‘2025-01-01’

Snowflake eliminates 90 percent before scanning.

This decision happens here.

Optimizer decisions include:

  • Join order (if multiple tables)
  • Filter pushdown
  • Aggregation pushdown
  • Partition pruning
  • Choosing hash vs merge join
  • Choosing broadcast vs distributed join
  • Choosing parallel degree

End result:
An optimized logical execution plan.

6. Physical Plan Creation

Now Snowflake converts logical plan into a physical execution plan.

This stage determines:

  • How many execution operators
  • How data is distributed
  • How tasks are split
  • How many parallel threads

For example:

Instead of:
Scan → Filter → Aggregate

It becomes:

  • Parallel scan operators
  • Local aggregation nodes
  • Exchange nodes (for data redistribution)
  • Final aggregation node

This physical plan is what you see in:

👉 Query Profile

Operators like:

  • TableScan
  • Filter
  • HashAggregate
  • Exchange
  • Sort

End result:
A distributed execution DAG (Directed Acyclic Graph).

7. Task Scheduling in Warehouse

Now Cloud Services sends execution plan to the Virtual Warehouse.

Warehouse:

  • Allocates compute threads
  • Splits work into execution tasks
  • Distributes tasks across CPUs

Example:

If warehouse has 32 cores:

  • 32 scan tasks may run in parallel
  • Each task reads different micro-partitions

Tasks are created based on:

  • Data size
  • Partition count
  • Operation type
  • Available compute

This is where multithreading begins.

8. Distributed Execution

Now execution starts.

Step 1: Scan

Each thread reads assigned micro-partitions.

Step 2: Filter

Predicate applied locally inside each thread.

Step 3: Local Aggregation

Each thread computes partial SUM(amount) per customer_id.

Step 4: Exchange

Data is reshuffled if needed for final grouping.

Step 5: Final Aggregation

Global SUM per customer_id computed.

All this happens in parallel across CPUs.

9. Result Materialization

After final aggregation:

  • Results are gathered
  • Sorted if required
  • Serialized into result set format

If result set is small:
Returned directly to client.

If large:
Stored temporarily in result storage.

Snowflake also caches results for:

👉 Result cache reuse (if identical query runs again)

What Is the Final End Result of Parsing?

If you strictly ask:

What does parsing produce?

Answer:
It produces a validated Abstract Syntax Tree (AST).

But practically, the end product of the whole compilation pipeline is:

👉 A fully optimized distributed physical execution plan ready for parallel execution.

That plan:

  • Knows exactly which partitions to scan
  • Knows how data flows between tasks
  • Knows aggregation strategy
  • Knows parallel degree
  • Knows memory allocation strategy
Where You Can See This

You cannot see AST directly.

But you can see:

  • Logical + physical plan in Query Profile
  • Operator tree
  • Execution graph
  • Data volume between nodes
  • Skew indicators
Complete Flow Summary

User SQL
→ Parser (syntax check)
→ Semantic Analyzer (object validation)
→ Logical Plan
→ Cost-Based Optimizer
→ Physical Plan
→ Task Scheduler
→ Parallel Execution
→ Result Materialization
→ Client receives rows

Key Architectural Insight

Snowflake separates:

Control Plane (Cloud Services):

  • Parsing
  • Optimization
  • Metadata
  • Planning

Data Plane (Warehouse):

  • Execution
  • Parallel processing
  • Thread-level work

This separation allows:

  • Better scaling
  • Reusable planning
  • Efficient execution
  • Multi-cluster support

2. How Snowflake Uses Memory and Threads for Parallel Processing — A Practical Explanation

When people first learn that an X-Small Snowflake warehouse has roughly 8 execution threads and around 16 GB of memory, it’s natural to assume that memory is evenly divided per thread. Following that logic, you might calculate:

  • 16 GB total memory
  • 8 threads
  • So 2 GB per thread

Then, since a micro-partition is about 16 MB compressed, you might conclude:

2 GB / 16 MB ≈ 125 micro-partitions per thread

At first glance, this looks correct. But this is not how Snowflake’s execution engine actually works.

Why Memory Is Not Divided Per Thread

Snowflake does not statically allocate memory like:

Thread 1 → 2 GB
Thread 2 → 2 GB
Thread 3 → 2 GB

Instead, the warehouse node has a shared memory pool. All execution operators — scans, joins, aggregations, sorting — request memory dynamically from this shared pool.

Threads are execution units (CPU scheduling units), not memory containers.

So parallelism is determined by:

  • Number of execution threads (vCPUs)
  • Not by dividing total memory by partition size
Let’s Walk Through a Real Example

Assume:

  • Table size: 10 GB (compressed)
  • Average micro-partition size: ~16 MB compressed
  • Total partitions ≈ 640
  • After pruning, 160 partitions must be scanned
  • Warehouse: X-Small
  • Threads available: ~8

Now execution begins.

What Actually Happens

Snowflake creates a work queue containing 160 scan tasks (one per micro-partition).

At runtime:

  • 8 threads pick 8 partitions
  • Each thread scans one partition
  • When a thread finishes, it picks the next available partition
  • This continues until all 160 are processed

So at any given moment:

  • Only 8 micro-partitions are actively being processed
  • The remaining partitions are waiting in the queue

This is dynamic scheduling.

What Happens Inside Each Thread

A thread does not load the entire 16 MB compressed partition into memory and hold it there permanently.

Instead, it:

  1. Reads required column segments
  2. Decompresses small chunks
  3. Processes data in vectorized batches
  4. Applies filter conditions
  5. Emits results
  6. Releases memory buffers
  7. Moves to the next partition

Memory is reused continuously.

This streaming model keeps memory efficient and avoids spikes.

Why Snowflake Does Not Process 125 Partitions Per Thread

Even if 2 GB were theoretically available per thread, processing 125 partitions simultaneously would:

  • Increase memory pressure
  • Cause CPU cache inefficiency
  • Add context switching overhead
  • Provide no CPU benefit (still only 8 threads)

Since CPU cores determine true parallelism, Snowflake keeps:

Parallel tasks = number of execution threads

Not = total memory divided by partition size

What Actually Limits Parallelism

Parallel execution depends on:

  • Number of vCPUs (threads)
  • I/O bandwidth
  • Query complexity
  • Data distribution

Memory supports execution but does not define how many partitions are processed simultaneously.

Warehouse Size vs Threads and Memory

Snowflake does not publish exact hardware specs, but scaling is predictable. Each size roughly doubles compute and memory.

Here is an approximate model:

Warehouse SizeCompute NodesApprox Threads (vCPUs)Approx Memory Per Warehouse
X-Small1~8~16 GB
Small2~16~32 GB
Medium4~32~64 GB
Large8~64~128 GB
X-Large16~128~256 GB

Each size doubles:

  • Compute nodes
  • Total threads
  • Total memory
  • Network bandwidth

If you move from X-Small (8 threads) to Medium (32 threads), your 160 partitions would be processed in:

  • X-Small → 160 / 8 = 20 scheduling waves
  • Medium → 160 / 32 = 5 waves

That’s why scan-heavy queries scale almost linearly with warehouse size.

The Correct Mental Model

Incorrect model:

Memory per thread determines how many partitions can be processed simultaneously.

Correct model:

Threads determine concurrency.
Micro-partitions determine total work units.
Memory is shared and dynamically allocated.

Snowflake’s design focuses on:

  • Small micro-partitions
  • Dynamic scheduling
  • Streaming execution
  • Vectorized processing
  • Shared memory management

This is what allows efficient parallelism without manual tuning.