Snowflake Dynamic Tables help you build near real-time data pipelines by automating incremental transformations. You define a query once, and Snowflake keeps the results fresh based on a chosen TARGET_LAG
. Unlike traditional tables or materialized views, Dynamic Tables simplify ELT pipelines and reduce orchestration needs.
These Snowflake Dynamic Table: Interview Questions & Answers highlight key concepts, their role compared to Streams and Tasks, and benefits like incremental refresh, performance, and cost management.
For more detailed information, you can read the official Snowflake documentation on Dynamic Tables.
Related Read: Master Snowflake JSON: Interview Questions & Answers – L3
Q1. What is a Dynamic Table in Snowflake?
A Dynamic Table is a special kind of table in Snowflake that automatically keeps its data fresh. You don’t manually load data into it. Instead, you define it using a SQL query that pulls from other tables. Snowflake then takes on the job of running that query in the background on a schedule you set. It automatically figures out what changes happened in the original source tables and only updates the Dynamic Table with those new changes. This means it always shows a recently computed result based on the source data, making it a powerful and easy way to keep summary data or transformed data up-to-date without writing complex code to update it yourself
Think of it like a live report. For example, imagine you have a massive table of every single sales transaction. You often need to see the total sales for each product category. Instead of writing a slow, heavy query that sums up millions of rows every time someone asks, you can create a Dynamic Table. Your query would be SELECT product_category, SUM(sale_amount) FROM raw_sales GROUP BY product_category
. Once created, this Dynamic Table is a real table that only holds those category totals. Every few minutes (or whatever schedule you choose), Snowflake checks the main raw_sales
table for new transactions. It then quickly calculates the new totals and updates the Dynamic Table. When your team queries the small Dynamic Table, they get the answer instantly, and the data is always just a few minutes old.
Q2. How is Snowflake Dynamic Table different from a regular table?
A regular table in Snowflake is a storage object where you insert or load data manually using INSERT
statements, COPY INTO
, or pipelines like Snowpipe. The data stays as-is until you explicitly run queries or transformations on it. For example, if you want an aggregated sales table, you write a query to insert or update it from your raw data whenever you choose.
A Snowflake Dynamic Table, on the other hand, is more like a self-refreshing view that Snowflake manages for you. You define it with a SQL query, and Snowflake automatically keeps the table up to date with the latest data from the source. You don’t have to write or schedule extra ETL jobs.
For example:
CREATE OR REPLACE DYNAMIC TABLE daily_sales
TARGET_LAG = '1 hour'
WAREHOUSE = my_wh
AS
SELECT DATE(order_time) AS order_date, SUM(amount) AS total_sales
FROM raw_orders
GROUP BY DATE(order_time);
Here, Snowflake checks the raw_orders
table regularly (with a max lag of 1 hour) and ensures daily_sales
always has near real-time aggregated results.
So, the key differences are:
- Regular table: Static, updated only when you load or modify it.
- Dynamic table: Automatically updated based on defined query logic, without manual ETL.
Dynamic tables reduce operational work and are ideal for building near real-time reporting tables, whereas regular tables are best for storing raw or batch-loaded data.
Q3. How is Snowflake Dynamic Table different from a Materialized View?
Both pre-compute results, but they are designed for different levels of complexity and give you different controls. A Materialized View is best for simple, fast pre-computation on a single table. Its query can only reference one table, and it’s designed to be refreshed very quickly, almost instantly after a change, to provide the freshest data possible. A Dynamic Table is much more powerful and flexible. It can be built using a complex query that joins many tables, uses features like UNION
, PIVOT
, and window functions. You trade a bit of the ultra-low latency for this massive increase in capability. You also have direct control over its refresh schedule using the TARGET_LAG
parameter.
Imagine you want a view that shows the top-selling product each day. A Materialized View can’t do this because it likely requires a window function (like RANK()
) over data from an orders
table joined with a products
table—this is too complex. A Materialized View is limited to a simple SELECT product_id, SUM(quantity) FROM orders GROUP BY product_id
. A Dynamic Table can easily handle the complex query to find the daily winner. You could set its TARGET_LAG
to 5 minutes, meaning your “top product” data would never be more than 5 minutes out of date, which is perfect for a daily report, and far easier than building a custom pipeline to update a regular table.
Q4. What are the main use cases of Snowflake Dynamic Tables?
The main use cases are for simplifying data pipelines that need to transform and aggregate data continuously. First, they are perfect for creating summary tables and dashboards. Instead of running heavy queries on huge tables every time a user opens a dashboard, the dashboard can query a small, pre-built Dynamic Table, leading to lightning-fast load times. Second, they are excellent for data transformation. You can use them to clean, filter, join, and reshape data as it arrives, creating a new, analysis-ready table without any manual effort. This is often called creating “medallion architecture” layers (like a Silver or Gold table) automatically.
For a real-time example, consider an e-commerce company. Their raw data is a messy stream of page clicks, cart additions, and purchases. One use case is a dashboard for managers showing total sales per hour. A Dynamic Table can aggregate billions of clickstream records into a simple 24-row-per-day table. The dashboard queries this tiny table and loads instantly. Another use case is preparing data for machine learning. The data science team needs a clean table where each row is a customer with features like “number of purchases in last 30 days” and “total money spent.” A complex query joining orders, customers, and returns tables can define this. A Dynamic Table will automatically keep this ML-ready dataset current, so the scientists always train their models on the latest data without asking an engineer to rebuild the dataset.
Q5. How do you create a Dynamic Table in Snowflake?
You create a Dynamic Table using a special variation of the CREATE TABLE
command. The syntax is CREATE OR REPLACE DYNAMIC TABLE <table_name> ...
. The two most important parts of this command are the TARGET_LAG
clause and the AS SELECT
query. The TARGET_LAG
tells Snowflake how fresh the data should be (e.g., 1 minute, 5 minutes, 1 hour). The AS SELECT
clause is the SQL query that defines what data should be in the table. This query can be as simple or complex as you need, pulling from one or many source tables.
Here is a concrete example. Let’s create a Dynamic Table that shows the total revenue generated by each salesperson for the current month, and we want it to be at most 5 minutes behind the real data.
CREATE OR REPLACE DYNAMIC TABLE sales_performance TARGET_LAG = '5 minutes' WAREHOUSE = 'my_wh' AS SELECT salesperson_id, SUM(order_amount) as total_revenue FROM raw_orders WHERE order_date >= DATE_TRUNC('month', CURRENT_DATE()) GROUP BY salesperson_id;
When you run this command, Snowflake does a few things. First, it creates a new table called sales_performance
. Then, it runs your SELECT
query once to populate the table with the initial data. Finally, it sets up a background process. Every 5 minutes (or less, as needed), this process wakes up, checks the raw_orders
table for any new or changed orders that happened in the current month, and then updates the total_revenue
figures for the affected salespeople in the sales_performance
table. You can now query sales_performance
anytime and get a fast answer.
Q6. What is the role of the TARGET_LAG parameter in Dynamic Tables?
The TARGET_LAG
parameter is the primary control you have over the freshness of the data in your Dynamic Table. It is a balance between performance and cost. Setting a short lag (e.g., ‘1 minute’) means Snowflake will work harder to refresh the table very frequently, which costs more in compute credits but gives you data that is almost real-time. Setting a longer lag (e.g., ‘1 day’) means the table will be refreshed less often, which is cheaper, but your data will be older. It’s a way of you telling Snowflake, “It’s okay if the data in this table is up to [this amount of time] behind the actual source data.”
For instance, imagine two Dynamic Tables built from the same website clickstream data. One table powers a live executive dashboard that tracks website traffic as it happens. For this, you would set a very aggressive TARGET_LAG = '1 minute'
. This ensures the executives see trends that are at most one minute old. The other table is used for a nightly email report that goes to all employees. This doesn’t need fresh data. You would set TARGET_LAG = '12 hours'
or even '1 day'
. This tells Snowflake it can bundle up the refreshes and do them in a more cost-efficient way, significantly reducing the compute cost of maintaining that second table. You choose the lag based on the business need.
Q7. How frequently are Dynamic Tables refreshed?
You don’t set a fixed cron-like schedule. Instead, the refresh frequency is automatically managed by Snowflake based on the TARGET_LAG
you set and how often the underlying data changes. Snowflake’s internal automation continuously monitors the source tables. When it detects that enough changes have occurred that the Dynamic Table’s data is becoming “stale” relative to its TARGET_LAG
, it triggers a refresh. This means a Dynamic Table could refresh many times within a minute if the source data is changing very rapidly, or it might not refresh for an hour if the source data is completely static. The system handles the scheduling for you to meet the freshness goal you defined.
For example, you set up a Dynamic Table with TARGET_LAG = '5 minutes'
on a log table that typically gets 10 new records per hour. For most of the hour, the Dynamic Table will not refresh because those 10 new records don’t violate the 5-minute freshness promise. It might wait and do one refresh at the end of the hour to process all 10 records at once. Now, imagine there’s a sudden spike in traffic and the log table receives 10,000 new records in one minute. Snowflake will detect this massive influx of new data almost immediately and will likely trigger a refresh of the Dynamic Table right away to catch up and keep the data within the 5-minute lag window. The refresh is intelligent and event-driven.
Q8. Can you directly insert or update data in a Dynamic Table?
No, you cannot. A Dynamic Table is a read-only object from a user’s perspective. Its entire purpose is to be managed automatically by Snowflake. If you try to run an INSERT
, UPDATE
, DELETE
, or TRUNCATE
command on a Dynamic Table, you will get an error. The only way to change the data inside it is by changing the data in the source tables that its defining query uses. When the source data changes, Snowflake’s automated refresh process will eventually pick up that change and update the Dynamic Table accordingly. This is a key design feature that ensures the data in the Dynamic Table always correctly reflects the result of the query you defined.
Let’s say you have a Dynamic Table that shows average customer ratings for products. You notice a mistake in the average for a specific product. You cannot simply run UPDATE product_ratings_dt SET average_rating = 4.5 WHERE product_id = 123;
. This command will fail. To fix the data, you must go to the source—the table that holds the individual product reviews. Perhaps you need to correct a typo in a specific review’s rating value there, or maybe you need to delete a fraudulent review. Once you make that change in the underlying product_reviews
table, the next refresh cycle of the Dynamic Table will automatically recalculate the average rating for that product, and the correct number will appear.
Q9. What happens if the underlying source table of a Dynamic Table changes?
It depends on the type of change. If the change is to the data within the source tables (e.g., new rows are inserted, existing rows are updated, or rows are deleted), the Dynamic Table handles it perfectly. Its refresh process is built to identify these changes and update its own content to reflect the new result of its query. However, if the structure of a source table changes (this is called a DDL change), it can break the Dynamic Table. For example, if a column that is used in the Dynamic Table’s query is dropped or renamed, the query will no longer work. The Dynamic Table will enter a paused state and will no longer refresh. You will receive an error message, and you will need to adjust the definition of the Dynamic Table to match the new source table structure before it can start working again.
Imagine your Dynamic Table is defined as SELECT user_id, email FROM raw_users
. This works fine. Now, a colleague decides to clean up the raw_users
table and changes the name of the email
column to user_email
. The next time the Dynamic Table tries to refresh, its query will fail because the email
column no longer exists. The Dynamic Table will stop refreshing and its status will show as suspended or failed. To fix this, you must alter your Dynamic Table’s definition using a CREATE OR REPLACE ...
command to use the new column name: SELECT user_id, user_email AS email FROM raw_users
. After you fix the query, the Dynamic Table will resume refreshing normally.
Q10. Can Dynamic Tables be queried like normal tables?
Yes, absolutely. This is one of their biggest advantages. Once created, a Dynamic Table exists in your schema just like any other table. You can query it using a standard SELECT * FROM my_dynamic_table
statement in a worksheet, from a BI tool like Tableau or Looker, or from another SQL query. You can join it to other tables or even other Dynamic Tables. You can grant permissions on it to other roles and users. From the perspective of someone querying it, there is no difference between a Dynamic Table and a regular table. They get the benefits of fast performance and fresh data without needing to know about the complex automation happening behind the scenes.
For example, your data analyst team uses a tool called Power BI for their reports. You have a regular table called customers
and a Dynamic Table you built called customer_lifetime_value
. An analyst can build a Power BI chart that joins these two tables together:
SELECT c.customer_name, clv.total_value FROM customers c JOIN customer_lifetime_value clv ON c.id = clv.customer_id ORDER BY clv.total_value DESC
The analyst does not need to know that customer_lifetime_value
is a Dynamic Table. They just know it’s a table that has the data they need. Power BI sends the query to Snowflake, which processes it instantly because it’s just joining two tables. The fact that Snowflake is working hard in the background to keep customer_lifetime_value
updated every hour is completely hidden from the end user, who simply enjoys accurate and fast results.
Q11. How do Dynamic Tables handle schema evolution in source data?
Dynamic Tables are not automatically immune to changes in the source tables they depend on. If the change is a simple addition, like a new column being added to a source table, the Dynamic Table is unaffected because its query does not reference that new column. It continues to refresh normally. However, if the change is breaking, the Dynamic Table will pause and stop refreshing. Breaking changes include dropping a column that the Dynamic Table’s query uses, renaming that column, or changing its data type in a way that makes the query fail. Snowflake does not automatically alter the definition of your Dynamic Table. You will get an error, and the system will mark the Dynamic Table as suspended. To fix it, you must manually update the Dynamic Table’s SQL query to match the new source schema.
Imagine your Dynamic Table is built to show customer details: SELECT customer_id, first_name, last_name FROM raw_customers
. The marketing team decides the raw_customers
table needs a preferred_name
column instead of separate first_name
and last_name
columns. They drop the old columns and add the new one. The next time your Dynamic Table tries to refresh, the query will fail because first_name
and last_name
no longer exist. The refresh job will stop. To get it working again, you must change the Dynamic Table’s definition to use the new schema, for example: SELECT customer_id, preferred_name AS full_name FROM raw_customers
. After you run this CREATE OR REPLACE
statement, the Dynamic Table will resume its automated refreshes.
Q12. What are the limitations of Dynamic Tables compared to Streams and Tasks?
Streams and Tasks are lower-level building blocks that offer more granular control, while Dynamic Tables are a higher-level, automated service. A key limitation is that Dynamic Tables are declarative—you say what the result should be, not how to achieve it. Streams and Tasks are procedural, letting you write exact step-by-step SQL logic for transforming data, which is needed for very complex, custom pipelines. Also, Dynamic Tables currently cannot perform actions outside of Snowflake, like calling an external API or sending a notification when data changes. A Task can be set up to send an email alert via an external function if a certain condition is met in the data. Furthermore, Streams track changes at a very detailed level (row-by-row), which is useful for auditing, while Dynamic Tables are focused on the final aggregated state.
Suppose you need a pipeline that does more than just transform data. Your process needs to: 1) Detect new rows in a table, 2) For each new row, call a machine learning model hosted outside Snowflake to get a risk score, 3) Insert the original data plus the risk score into a new table, and 4) If the risk score is very high, immediately send a message to a Slack channel. A Dynamic Table cannot do steps 2 and 4 because it only transforms data inside Snowflake. For this, you would use a Stream to identify the new rows and a Task that runs a stored procedure. This procedure would contain the code to call the external API, process the results, and send the Slack message, giving you the precise control you need.
Q13. How is the refresh process of a Dynamic Table scheduled?
The refresh process is not scheduled like a traditional cron job with fixed times. It is an automated, continuous process managed entirely by Snowflake. You set a goal for data freshness using the TARGET_LAG
parameter (e.g., 1 minute). Snowflake’s internal scheduler then constantly monitors the source tables for changes. It uses this information to intelligently decide when to refresh the Dynamic Table. It aims to group changes and perform refreshes in the most efficient way possible while still meeting your freshness target. This means the refresh might happen many times a minute if data is changing rapidly, or it might happen only once an hour if changes are infrequent and your TARGET_LAG
allows for it.
Think of it like a smart assistant. You tell your assistant, “Keep this report updated so it’s never more than 10 minutes out of date.” You don’t tell them to check for updates every 30 seconds. The assistant sits and watches the data feed. For the first 9 minutes, nothing happens. Then, a few new numbers come in. The assistant waits to see if more will arrive. At the 9-and-a-half-minute mark, seeing no more data, they quickly update the report to meet your 10-minute promise. The next day, there’s a huge, constant stream of data. The assistant is constantly updating the report every few seconds to keep up with the flow and stay within the 10-minute lag. The assistant (Snowflake) handles the timing; you just set the freshness requirement.
Q14. What is the difference between incremental refresh and full refresh in Dynamic Tables?
An incremental refresh is highly efficient. Snowflake identifies only the data that changed in the source tables since the last refresh and then computes only the necessary changes to update the Dynamic Table. A full refresh is much more heavy-handed; it completely re-runs the entire SQL query against the full source data, wiping the Dynamic Table and rebuilding it from scratch. Snowflake automatically tries to use incremental refreshes whenever possible because they are faster and cheaper. It will only fall back to a full refresh if the change to the source data is so complex that an incremental update isn’t feasible, or if you manually alter the definition of the Dynamic Table’s query.
Consider a Dynamic Table that calculates the total number of website visitors per day. On a normal day, as new visitors arrive, Snowflake performs incremental refreshes. It sees that 50 new visitor records were added for “2024-05-17”. It simply adds 50 to the existing total for that day in the Dynamic Table. This is very quick. Now, imagine you need to change the logic. You realize you should only count unique visitors, so you change the query to use COUNT(DISTINCT visitor_id)
instead of COUNT(*)
. This is a fundamental change to the calculation. Snowflake cannot incrementally apply this new logic to the old data. Therefore, when you save this new query definition, it triggers a full refresh. Snowflake will scan the entire history of visitor data to recalculate the correct unique counts for every single day, which takes more time and compute power.
Q15. Can you join multiple source tables when defining a Dynamic Table?
Yes, this is one of the key strengths of Dynamic Tables and a major differentiator from Materialized Views. The SQL query you use to define a Dynamic Table can be very complex and can include joins across many different source tables. You can use INNER JOIN
, LEFT JOIN
, and other join types. You can also use other SQL clauses like UNION
, PIVOT
, and window functions (e.g., RANK()
, LEAD()
). This allows you to create rich, transformed datasets that are automatically kept in sync with multiple source systems, which is a very common requirement in real-world data engineering.
A common example is creating a customer 360 view. Your data is spread across several tables: customers
(name, email), orders
(purchase history), and support_tickets
(customer inquiries). You can create a single, easy-to-query Dynamic Table that joins all this information together for each customer.
CREATE DYNAMIC TABLE customer_360 AS SELECT c.customer_id, c.name, c.email, COUNT(o.order_id) AS total_orders, MAX(o.order_date) AS last_order_date, COUNT(t.ticket_id) AS open_tickets FROM customers c LEFT JOIN orders o ON c.customer_id = o.customer_id LEFT JOIN support_tickets t ON c.customer_id = t.customer_id AND t.status = 'Open' GROUP BY c.customer_id, c.name, c.email;
This Dynamic Table will automatically update whenever a new order is placed, a support ticket is opened or closed, or a customer record is updated, giving you a always-up-to-date view of each customer’s activity.
Q16. How do you monitor the refresh status of a Dynamic Table?
You monitor the status by querying special system views that Snowflake provides, specifically the INFORMATION_SCHEMA.DYNAMIC_TABLES
view or the SNOWFLAKE.ACCOUNT_USAGE.DYNAMIC_TABLES
view. These views show you crucial information about each Dynamic Table, the most important column being REFRESH_MODE
. This mode tells you the current state: ‘FULL’ means it is currently doing a full refresh, ‘INCREMENTAL’ means it is doing a fast incremental refresh, ‘AUTO’ means it is choosing the mode automatically, and ‘ERROR’ means it has failed and is suspended. You can also see the timestamp of the last refresh and the next scheduled refresh time. This lets you quickly check the health and freshness of all your Dynamic Tables.
As a data engineer, you might start your day by running a simple monitoring query:
SELECT name, target_lag, refresh_mode, last_refresh, scheduled_time, state FROM INFORMATION_SCHEMA.DYNAMIC_TABLES;
If you see a Dynamic Table with a state
of ‘SUSPENDED’ and a refresh_mode
of ‘ERROR’, you know something is wrong—perhaps a source column was dropped. If you see a table whose last_refresh
time is many hours ago and its target_lag
is 5 minutes, you know it is falling behind and not meeting its goal, which might mean the warehouse doesn’t have enough resources. This quick check gives you a dashboard of your entire data pipeline’s health.
Q17. What system views or functions can you use to check the history of Dynamic Table refreshes?
To get detailed historical data, you use the SNOWFLAKE.ACCOUNT_USAGE.DYNAMIC_TABLE_REFRESH_HISTORY
view. This is a powerful view in the account usage schema that provides a log of every refresh operation for all Dynamic Tables in your account. For each refresh, it tells you when it started and ended, whether it was incremental or full, how many rows were processed, and if it was successful. This history is vital for troubleshooting performance issues, understanding compute costs, and auditing the pipeline’s behavior over time. The INFORMATION_SCHEMA
version of this view exists but only shows data for your current schema.
For example, your finance team asks why their summary report was slow to update yesterday between 2 PM and 3 PM. You can query the history to investigate:
SELECT * FROM SNOWFLAKE.ACCOUNT_USAGE.DYNAMIC_TABLE_REFRESH_HISTORY WHERE table_name = 'financial_summary' AND refresh_start_time >= '2024-05-16 14:00:00' AND refresh_end_time <= '2024-05-16 15:00:00';
The results might show that a full refresh was triggered at 2:15 PM because of a schema change, which took 40 minutes to complete, explaining the latency. Or, it might show that an incremental refresh processed an unusually high number of rows due to a data spike, which caused the refresh to take longer than usual. This data is essential for root cause analysis.
Q18. Can Dynamic Tables be used as sources for other Dynamic Tables?
Yes, you can build chains of Dynamic Tables. This is a very powerful pattern for building multi-layered data pipelines. The output of one Dynamic Table can be used as the source for another. This allows you to break down complex transformations into smaller, more manageable steps. For instance, you could have a first Dynamic Table that cleans and filters raw data. A second Dynamic Table could then join this cleaned data with other sources. A third Dynamic Table could then take that joined data and perform the final aggregation for a specific business report. Each table in the chain has its own TARGET_LAG
, allowing you to balance cost and freshness at each stage.
Imagine a pipeline for a sales commission system. The raw data is a stream of all sales. Step 1: A Dynamic Table called cleaned_sales
filters out voided transactions and invalid records. It has a TARGET_LAG
of 1 minute to clean data quickly. Step 2: A second Dynamic Table called sales_with_quotas
joins the cleaned_sales
data with a sales_quotas
table to calculate if each sale met its target. This has a TARGET_LAG
of 5 minutes. Step 3: A final Dynamic Table called commission_calculations
takes the output from sales_with_quotas
and applies the complex business rules for calculating the actual commission amount. This final table, with a TARGET_LAG
of 15 minutes, is what the payroll system uses. This chaining organizes the logic and manages performance.
Q19. How does Snowflake ensure data consistency during refreshes?
Snowflake uses its core strength—atomicity and isolation—to guarantee that a refresh provides a consistent view of the data. When a Dynamic Table refreshes, it processes the changes from its source tables based on a specific point in time. This means that even if the source data is being updated constantly while the refresh is happening, the refresh operation works on a consistent snapshot. The result is that the Dynamic Table’s data is always a correct and coherent result of the query as it would have run at that specific point in time. You will never get a result where half of the joined data is from one moment and the other half is from a later moment.
Think of it like taking a group photo. You want a picture of the whole team at 3 PM. Even if people are moving around and coming in and out of the room, the photo captures everyone exactly as they were at that single moment in time. You don’t end up with a photo where one person is looking at their phone at 3:00 and another is waving at 3:01. Similarly, when a Dynamic Table refreshes at a given time T, it takes a “photo” (a consistent snapshot) of all its source tables at time T. It then runs its query against that perfect snapshot. This ensures that the numbers always add up correctly and the joins are consistent, even in a very busy system.
Q20. How do Dynamic Tables impact compute costs compared to Materialized Views or Tasks?
Costs are driven by the compute resources needed for refreshing. Materialized Views are generally the cheapest for very simple, single-table aggregations because their refresh mechanism is extremely lightweight and fast. For complex, multi-table transformations, Dynamic Tables can be more cost-effective than building your own pipeline with Streams and Tasks. This is because Snowflake’s automation is highly optimized for incremental refreshes. A manually built Task might accidentally do a full refresh more often or use a less efficient query, wasting credits. Dynamic Tables automate this efficiency. However, if you need very low latency (sub-one-minute), the constant refreshing of a Dynamic Table could become more expensive than a less frequent Task. You must choose the right tool for the job.
Let’s say you need a daily summary of sales. Option 1: A Materialized View on the sales table. This is very cheap but can’t handle complex logic like joining with a product table to get categories. Option 2: A Dynamic Table that does the join and groups by product category. You set TARGET_LAG = '1 day'
. It does one incremental refresh per day, which is very efficient and likely cheap. Option 3: You build your own pipeline with a Stream and a Task that runs every hour. Your SQL might not be as optimized as Snowflake’s engine, causing the Task to use more compute each run. Even though it’s just a daily report, your hourly Task runs 24 times, potentially costing more than the single efficient refresh of the Dynamic Table. The Dynamic Table offers a good balance of power and cost for this use case.
Q21. Explain the internal architecture of Dynamic Tables in Snowflake.
Imagine a Dynamic Table’s architecture as a small, self-sufficient factory inside Snowflake. This factory has a few key parts that work together automatically. First, there is the Blueprint, which is the SQL query you write. This tells the factory exactly what the final product should look like. Second, there is the Manager, a smart system that constantly watches the raw material suppliers (your source tables). The manager uses your TARGET_LAG
instruction to decide how urgently new materials need to be processed. Third, there is the Workshop, which is the virtual warehouse you assign. This is where the actual work of building and updating the product happens.
The magic is in the automation. The manager doesn’t wait for you to give orders. It automatically sees when new data arrives in the source tables. It then sends a work order to the workshop. The workshop is clever; it usually doesn’t rebuild the entire product from scratch. Instead, it only takes the new or changed materials and figures out the smallest possible adjustment to the final product. This is called an incremental refresh. The final product—the actual data—is then stored neatly on shelves, which is Snowflake’s cloud storage. To anyone querying it, it looks like a simple, ready-to-use table, but behind the scenes, this entire automated factory is working to keep it current without you lifting a finger.
Q22. How does Snowflake decide when to trigger a refresh of a Dynamic Table?
Snowflake makes this decision based on a combination of your instructions and its own smart monitoring. Your main instruction is the TARGET_LAG
. You are basically telling Snowflake, “I’m okay if this data is up to 10 minutes old.” Snowflake takes this as a promise it must keep. It doesn’t use a simple timer. Instead, it constantly checks how much new data is coming into the source tables and how quickly.
Think of it like filling a bathtub with a slow drip from the faucet. You don’t need to check the water level every second. But if someone suddenly dumps a bucket of water into the tub, the level jumps up, and you need to act to prevent it from overflowing. Similarly, Snowflake watches the flow of data. If data is trickling in slowly, it might wait a bit to group several small changes into one efficient refresh. But if there’s a sudden large load of data, it immediately triggers a refresh to process that data and keep the information within the freshness limit you set. It’s a smart, cost-effective system that reacts to the workload instead of just blindly following a clock.
Q23. What are the differences between Dynamic Tables, Materialized Views, and Streams + Tasks in terms of performance and use cases?
Choosing between these is like choosing the right vehicle for a trip.
- Materialized Views are like scooters. They are zippy, simple, and perfect for short, straight trips on a single road. They are incredibly fast for making simple calculations (like a sum or a count) on a single table. But you can’t take them off-road; they can’t handle joins with other tables or complex logic. Use them to speed up a dashboard that pulls from one big table.
- Dynamic Tables are like pickup trucks. They are powerful, versatile, and can carry a heavy load. You can use them for complex journeys that involve multiple stops (joining many tables) and carrying tricky cargo (complex SQL with window functions). They are the go-to tool for automatically building entire new datasets that are always up-to-date. There might be a small delay (a few minutes) as the truck gets loaded, but it does the hard work for you.
- Streams and Tasks are like building your own car from parts. They give you total control. You can build anything you can imagine, even if it has special parts like a radio that calls outside services (APIs). However, you are responsible for every detail—the engine, the steering, the brakes. It’s powerful but requires more work and knowledge to build and maintain. Use this when you need to do something very custom that the other tools can’t do.
Q24. How do you optimize performance when using Dynamic Tables with large datasets?
Making Dynamic Tables efficient with large data is about being a good helper to Snowflake. The most important thing is to set up your source tables for success. This means defining cluster keys on columns that are used often in filters and joins, like a date
column or an id
column. This helps Snowflake quickly find the new data it needs for an incremental refresh without having to search through the entire massive table every time.
Next, write a smart query. Avoid unnecessary complexity that makes it hard for Snowflake to figure out what changed. Use clear join conditions and simple filters. Also, choose the right size warehouse. A small warehouse might be cheap, but it will be slow to process a large refresh. A larger warehouse will get the job done faster, which might be cheaper overall because it uses compute resources for a shorter amount of time. Finally, be realistic with your TARGET_LAG
. If you don’t need data that is less than a minute old, set the lag to 5 or 10 minutes. This gives Snowflake more time to group changes together and perform the refresh in the most efficient way possible.
Q25. How do Dynamic Tables handle late-arriving or changed data in the source?
Dynamic Tables handle this correctly because they are always based on the current truth in the source data. When a refresh happens, it looks at all changes that have happened up to that very moment. If a record arrives late—for example, a sales transaction from yesterday was just added today—the next refresh will see this new record. It will then update any calculations that are affected. If your Dynamic Table shows “total sales by day,” the total for yesterday will be updated to include the late sale.
The same happens if data is corrected. If a customer’s address is updated in the source table, the next refresh will process that update. The old, wrong address in the Dynamic Table will be replaced with the new, correct one. This ensures that the Dynamic Table always eventually reflects the most accurate picture based on the source data. There might be a short delay until the next refresh, but you can be sure the data will correct itself automatically without you needing to write any special code to handle these situations.
Q26. Can you cascade multiple Dynamic Tables together? What are the performance considerations?
Yes, you can chain them together, and this is a very common and powerful way to build a data pipeline. For example, Dynamic Table A cleans raw data. Dynamic Table B then joins that clean data with another table. Dynamic Table C then takes the result from B and calculates final summary reports. This lets you break one big, complex job into smaller, easier-to-manage steps.
The main thing to consider is that the timing stacks up. Each Dynamic Table in the chain has its own TARGET_LAG
. If Table A is set to update every 5 minutes, and Table B depends on A, then Table B can’t start its refresh until after Table A finishes. If Table B also takes 5 minutes, then Table C, which depends on B, will have to wait even longer. The total time for a change to go from the raw source data to the final report is the sum of all the refresh times in the chain. To manage this, you might set a shorter lag for the final table that people query and a longer lag for the earlier processing tables, or just understand that the entire process will take some time.
Q27. How would you troubleshoot a Dynamic Table that is not refreshing as expected?
You would start by doing a basic health check, just like a doctor checking a patient. Run a simple query on the system view that shows the status of your Dynamic Tables:
SELECT name, refresh_mode, state, last_refresh FROM INFORMATION_SCHEMA.DYNAMIC_TABLES WHERE name = 'your_table_name';
Look for two important things:
STATE
: If this saysSUSPENDED
, it means the table has completely stopped because it ran into an error.REFRESH_MODE
: If this saysERROR
, it confirms the last refresh attempt failed.
If the table is suspended, the next step is to find the error message. You can use a command like SELECT * FROM TABLE(SYSTEM$DYNAMIC_TABLE_REFRESH_HISTORY('your_table_name'));
to see a log of refreshes. The error message there will usually tell you exactly what went wrong. The most common problem is that someone changed a source table—like renaming or dropping a column—and the Dynamic Table’s query no longer works. The fix is to update the Dynamic Table’s SQL query to match the new source table structure.
Q28. What privileges are required to create and manage Dynamic Tables?
You need a specific set of permissions to build and run this automated factory. First, you need permission to use the space: that’s USAGE on a database and a schema. Second, you need the building permit: that’s the CREATE DYNAMIC TABLE privilege on that specific schema. Third, you need the keys to the workshop: that’s USAGE on a virtual warehouse, which is the engine that does the refresh work. Finally, you need access to the raw materials: that’s SELECT privileges on all the source tables and views that your SQL query uses. Without these permissions, the query can’t read the data it needs to do its job.
Q29. How do Dynamic Tables integrate with Snowflake’s query acceleration features?
They work together perfectly because, in the end, a Dynamic Table is just a table. The acceleration works in two ways. First, during the refresh process, if the SQL query is very complex and has to read a huge amount of data, Snowflake’s internal query acceleration can automatically kick in to help scan and process that data faster, making the refresh finish quicker. Second, and more importantly, when users query the Dynamic Table, it is now a stored dataset. You can add features like Search Optimization to the Dynamic Table. This makes certain types of queries, like finding all rows for a specific customer ID, incredibly fast. Since Dynamic Tables are often already small and fast, these features make them even faster for your end-users.
Q30. What are some real-world scenarios where Dynamic Tables are better than ETL pipelines?
Dynamic Tables are better when you want to simplify your life and reduce the amount of code you have to build and maintain.
- Automated Reporting Layers: Instead of coding a complex nightly ETL job to build a summary table for a sales report, you just write one SQL query to define a Dynamic Table. Snowflake then handles all the incremental updates every hour or every few minutes. There’s no external tool to manage, no cron jobs to schedule, and no custom code to write for handling new data.
- Live Business Dashboards: Imagine a customer service team needs a screen that shows a live view of a customer’s recent orders and open support tickets. Building a real-time ETL pipeline for this is hard. With a Dynamic Table, you write a SQL query that joins the customers, orders, and tickets tables. You set a short refresh lag of one minute. You now have a always-fresh dataset that the dashboard can query directly, and it was created with almost no effort.
- Data Cleaning and Transformation Pipelines: A common practice is to have layers of data: raw, cleaned, and ready for analysis. Instead of using three separate ETL scripts to move data between these layers, you can create two Dynamic Tables. The first one cleans the raw data into a nice format. The second one aggregates that clean data for business users. This entire multi-step pipeline is defined and managed within Snowflake with no extra tools or code, making it very easy to manage.
Snowflake Dynamic Table Refresh Mode
Q31. What are the available refresh modes for Dynamic Tables in Snowflake?
Snowflake Dynamic Tables primarily operate with two refresh modes: incremental refresh and full refresh. However, it is crucial to understand that you, as the user, do not directly choose a mode like a switch. Instead, you define a target for data freshness, and Snowflake’s intelligent automation automatically selects the most efficient mode to meet that target. The system decides whether an incremental or full refresh is appropriate each time based on the nature of the changes in the underlying data.
Q32. How does incremental refresh mode differ from a full refresh?
An incremental refresh is like making a small, precise edit to a document. Snowflake only looks at the data that has changed in the source tables since the last refresh. It then computes the minimal set of changes needed to update the Dynamic Table. This is highly efficient, fast, and cost-effective because it avoids reprocessing vast amounts of unchanged data. For example, if 100 new rows are added to a source table, an incremental refresh will process just those 100 rows to update the aggregated results in the Dynamic Table.
A full refresh, on the other hand, is like throwing away the entire document and rewriting it from scratch. Snowflake completely re-runs the entire SQL query definition of the Dynamic Table against all the source data. It then replaces the entire content of the Dynamic Table with the new result. This process is computationally expensive and slower, especially on large datasets, as it must process every single row in the source tables, not just the changes.
Q33. In which scenarios would you prefer full refresh over incremental refresh?
You would never “prefer” a full refresh due to its cost and slowness; it is something you generally want to avoid. However, a full refresh is sometimes necessary and is automatically triggered by Snowflake in specific scenarios:
- After Changing the Table’s Definition: The most common scenario is when you alter the SQL query of the Dynamic Table using a
CREATE OR REPLACE
statement. The new logic may be so different that the old data is invalid. For example, if you change aSUM(sale_amount)
to anAVG(sale_amount)
, Snowflake cannot incrementally adjust the old sum into a new average. It must recalculate everything from the beginning to ensure correctness. - Complex Data Changes: If the changes in the source data are of a type that Snowflake’s incremental engine cannot handle efficiently for your specific query, it may fall back to a full refresh to guarantee the result is 100% accurate.
- Catching Up from a Long Pause: If a Dynamic Table has been suspended for a long time and has fallen very far behind, it can sometimes be faster for Snowflake to perform a full refresh to catch up rather than trying to process a huge number of incremental changes one by one.
Q34. How do you define refresh mode when creating a Dynamic Table?
You do not directly define the refresh mode. Your control is indirect and is exercised through the TARGET_LAG
parameter in the CREATE
statement. This parameter defines your data freshness requirement, and Snowflake’s automation uses this goal to manage the refresh process, including mode selection.
Your command defines the “what” (the query) and the “how fresh” (the target lag), not the “how” (the refresh mode). The syntax is:
CREATE OR REPLACE DYNAMIC TABLE sales_summary TARGET_LAG = '5 minutes' -- This is the key parameter for freshness WAREHOUSE = my_wh AS SELECT product_id, SUM(quantity) AS total_sold FROM raw_sales GROUP BY product_id;
In this example, you are instructing Snowflake: “Keep this data at most 5 minutes behind the source.” Snowflake then decides whether to use an incremental or full refresh for each update to meet this goal efficiently.
Q35. What role does Snowflake’s internal change tracking play in refresh mode selection?
Snowflake’s internal change tracking is the fundamental technology that makes incremental refresh possible. It is the mechanism that allows Snowflake to know exactly what data has been added, changed, or deleted in the source tables since the last refresh.
This capability is central to the mode selection logic:
- For Incremental Refresh: The change tracking system provides a precise list of altered rows. The refresh process can then consume only this “delta” of changes. If the changes are easily applicable to the existing Dynamic Table data, Snowflake will always choose an incremental refresh for speed and efficiency.
- Forcing a Full Refresh: The change tracking also helps Snowflake understand the nature of the changes. If the changes are so widespread or complex that applying them incrementally would be inefficient or could risk data accuracy, Snowflake’s engine will decide that a full refresh is the safer and more correct option.
In essence, change tracking provides the fuel for the incremental refresh mode. Without it, Snowflake would be forced to perform a full refresh every time to find what had changed. This internal automation is what makes Dynamic Tables so powerful and low-maintenance, as it handles the complex decision-making of change detection and application for you.
Snowflake Dynamic Table Limitations
Q36. What are some key limitations of Dynamic Tables compared to regular tables or materialized views?
Dynamic Tables are powerful but have specific constraints. Unlike regular tables, they are read-only and cannot be manually updated. Compared to Materialized Views, they offer much more complex query support (like multi-table joins) but may have higher refresh latency and cost. Materialized Views are limited to a single base table but can refresh much faster and are better for point queries on that table.
Q37. Can you directly insert, update, or delete rows in a Dynamic Table? Why or why not?
No, you cannot directly perform INSERT, UPDATE, or DELETE operations on a Dynamic Table. This is a core design feature. The entire content of a Dynamic Table is exclusively managed by Snowflake’s automated refresh process, which recalculates the data based on the defined SQL query and the changes in the underlying source tables. This ensures the data always remains a consistent and accurate result of that query.
Q38. What are the current restrictions on joins, aggregations, or nested queries inside Dynamic Tables?
While far more flexible than Materialized Views, some complex SQL constructs are not supported. Common limitations include certain types of nested subqueries (like correlated subqueries in the SELECT list), some advanced window functions, and non-deterministic functions like CURRENT_TIMESTAMP()
which can cause inconsistent results during refresh. The query must be written in a way that allows Snowflake to reliably calculate incremental changes.
Q39. How many levels of dependencies can you create with Dynamic Tables (i.e., chaining)?
You can create long chains of dependencies, but there is a hard limit. A single Dynamic Table can depend on up to 100 other tables or views, including other Dynamic Tables. Furthermore, the entire graph of dependencies for any single Dynamic Table is limited to a maximum depth of 1000 tables. This is usually more than sufficient for even very complex data pipelines.
Q40. Are there any limitations with schema evolution or altering a Dynamic Table definition?
Yes, this is a significant consideration. While you can alter a Dynamic Table’s definition, certain changes to the underlying source tables will break it. If a column used in the Dynamic Table’s query is dropped or renamed in a source table, the refresh process will fail, and the Dynamic Table will enter a suspended state. You must then manually update the Dynamic Table’s SQL query to match the new source schema before it can resume refreshing.