Analytical Functions and Their Usage
Analytical functions are powerful SQL tools that enable complex calculations while preserving row-level detail. They shine in scenarios requiring:
- Rankings and percentiles (top N per group, quartiles)
- Moving calculations (averages, sums)
- Period-over-period comparisons (monthly growth)
- Sessionization and sequence analysis
- Relative calculations (differences from averages)
The key advantage is performing these operations in a single pass through the data while maintaining access to all row details. They eliminate the need for many self-joins and correlated subqueries that were traditionally used for such calculations.
Master SQL Analytical Functions Interview Questions and Answers
Proper use of window functions can make queries both more readable and more efficient. However, they work best when you understand the window frame specifications and partition carefully to avoid excessive memory usage. For large datasets, pay special attention to the ORDER BY clauses in your OVER() specifications, as these often require sorting operations that can be expensive.
Real-world applications include financial analysis (running balances), operational metrics (rolling averages), customer behavior analysis (session grouping), and performance benchmarking (department rankings). Mastering window functions can elevate your SQL from simple data retrieval to sophisticated analytical processing.
Q1. What are analytical/window functions in SQL and how do they differ from aggregate functions?
Analytical functions, also called window functions, perform calculations across a set of table rows related to the current row. Unlike aggregate functions that collapse multiple rows into a single result, window functions maintain individual rows while adding computed values.
The key difference is that aggregate functions (like SUM, COUNT, AVG) group data and return one result per group, while window functions can return multiple results – one for each row in the original dataset.
-- Aggregate function (collapses rows) SELECT department, AVG(salary) FROM employees GROUP BY department; -- Window function (preserves rows) SELECT name, department, salary, AVG(salary) OVER(PARTITION BY department) as avg_dept_salary FROM employees;
Q2. Explain the OVER() clause in analytical functions and its significance
The OVER() clause is what defines the “window” or set of rows that the analytical function operates on. It’s the heart of window functions, specifying how to partition and order the data before applying the function.
Without OVER(), functions like RANK() or AVG() wouldn’t know how to behave as window functions. The clause can be empty (OVER()) to operate on the entire result set, or contain PARTITION BY and ORDER BY specifications.
-- Simple OVER() using entire result set SELECT name, salary, AVG(salary) OVER() as company_avg FROM employees; -- With PARTITION BY to create department windows SELECT name, department, salary, AVG(salary) OVER(PARTITION BY department) as dept_avg FROM employees;
Q3. What are the three main components of a window function syntax?
Window functions have three key parts:
- The function itself: Like RANK(), SUM(), LEAD(), etc.
- The OVER() clause: Defines the window frame
- Window specifications inside OVER():
- PARTITION BY (groups rows into partitions)
- ORDER BY (orders rows within partitions)
- Window frame (ROWS/RANGE between boundaries)
SELECT employee_id, department, salary, -- The three components: SUM(salary) OVER( -- 1. Function PARTITION BY department -- 2. Partition ORDER BY hire_date -- 3. Order ROWS BETWEEN 1 PRECEDING -- 4. Frame AND 1 FOLLOWING ) as moving_sum FROM employees;
Q4. How do window functions differ from GROUP BY operations?
Window functions and GROUP BY both work with groups of data, but in fundamentally different ways:
- GROUP BY collapses each group into a single row in the output
- Window functions preserve all original rows while adding computed values
- With GROUP BY, you can only select grouped columns or aggregates
- With window functions, you can select all individual row data plus computed values
-- GROUP BY approach (loses individual row details) SELECT department, COUNT(*) as emp_count FROM employees GROUP BY department; -- Window function approach (keeps all rows) SELECT name, department, COUNT(*) OVER(PARTITION BY department) as dept_count FROM employees;
Q5. Explain the difference between ROW_NUMBER(), RANK(), and DENSE_RANK() with examples
These ranking functions all assign numbers to rows, but handle ties differently:
- ROW_NUMBER(): Always gives unique numbers (1,2,3), even for ties
- RANK(): Gives same number for ties, then skips next numbers (1,2,2,4)
- DENSE_RANK(): Gives same number for ties, but doesn’t skip (1,2,2,3)
SELECT name, salary, ROW_NUMBER() OVER(ORDER BY salary DESC) as row_num, RANK() OVER(ORDER BY salary DESC) as rank, DENSE_RANK() OVER(ORDER BY salary DESC) as dense_rank FROM employees;
For salaries [10000, 9000, 9000, 8000], this would produce:
- ROW_NUMBER: 1,2,3,4
- RANK: 1,2,2,4
- DENSE_RANK: 1,2,2,3
Q6. When would you use NTILE() function? Provide a practical use case
NTILE() divides rows into roughly equal buckets. It’s useful when you need to segment data into percentiles, quartiles, or other equal groups.
Practical use case: Dividing customers into spending tiers (high, medium, low) based on their purchase amounts.
-- Divide products into 4 price quartiles SELECT product_name, price, NTILE(4) OVER(ORDER BY price) as price_quartile FROM products; -- Segment customers into 3 tiers based on yearly spending SELECT customer_id, yearly_spend, NTILE(3) OVER(ORDER BY yearly_spend DESC) as spending_tier FROM customers;
Q7. How would you find the top 3 highest salaries in each department using window functions?
We can use RANK() or DENSE_RANK() partitioned by department and ordered by salary:
WITH ranked_salaries AS ( SELECT name, department, salary, DENSE_RANK() OVER( PARTITION BY department ORDER BY salary DESC ) as salary_rank FROM employees ) SELECT name, department, salary FROM ranked_salaries WHERE salary_rank <= 3;
This query first assigns ranks within each department, then filters to only include ranks 1-3.
Q8. How do you calculate a running total using window functions?
Running totals are perfect for window functions. Use SUM() with ORDER BY in the OVER clause:
SELECT date, sales, SUM(sales) OVER( ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) as running_total FROM daily_sales;
The window frame “UNBOUNDED PRECEDING AND CURRENT ROW” means “all rows from the start up to current row”, which creates the running total effect.
Q9. Explain how to calculate moving averages with window functions
Moving averages smooth out data by averaging values within a sliding window. Specify the window frame with ROWS or RANGE:
-- 7-day moving average SELECT date, temperature, AVG(temperature) OVER( ORDER BY date ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING ) as moving_avg_7day FROM weather_data; -- 30-day centered moving average SELECT date, stock_price, AVG(stock_price) OVER( ORDER BY date ROWS BETWEEN 15 PRECEDING AND 15 FOLLOWING ) as moving_avg_30day FROM stock_prices;
Q10. How would you compare each employee’s salary to the average salary in their department?
This is a classic PARTITION BY use case – we create department windows and calculate averages within them:
SELECT name, department, salary, AVG(salary) OVER(PARTITION BY department) as dept_avg, salary - AVG(salary) OVER(PARTITION BY department) as diff_from_avg FROM employees;
This shows each employee’s salary alongside their department average and the difference between them.
Q11. What is the difference between ROWS and RANGE in window frame specifications?
ROWS and RANGE both define window frames but work differently:
- ROWS counts physical rows before/after current row
- RANGE includes rows based on logical value ranges
-- ROWS: Exactly 2 rows before current SELECT date, amount, SUM(amount) OVER( ORDER BY date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW ) as sum_3rows FROM transactions; -- RANGE: All rows with dates within 2 days of current SELECT date, amount, SUM(amount) OVER( ORDER BY date RANGE BETWEEN INTERVAL '2' DAY PRECEDING AND CURRENT ROW ) as sum_3days FROM transactions;
Q12. How would you write a query to get a 3-day moving average of sales?
SELECT sale_date, amount, AVG(amount) OVER( ORDER BY sale_date ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING ) as moving_avg_3day FROM daily_sales;
This averages each day with its immediate neighbor before and after. For edge cases (first/last day), it will automatically adjust the window.
Q13. Explain the UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING keywords
These keywords define window frame boundaries:
- UNBOUNDED PRECEDING: Start from the first row in the partition
- UNBOUNDED FOLLOWING: Continue to the last row in the partition
-- Running total from partition start to current row SELECT date, revenue, SUM(revenue) OVER( PARTITION BY region ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) as running_total FROM sales; -- Total for entire partition (current and all following rows) SELECT employee, month, sales, SUM(sales) OVER( PARTITION BY employee ORDER BY month ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING ) as remaining_total FROM employee_sales;
Q14. How do you create a cumulative sum that resets for each group?
Use PARTITION BY to define the groups where the sum should reset, then ORDER BY with UNBOUNDED PRECEDING for the cumulative behavior:
SELECT date, department, sales, SUM(sales) OVER( PARTITION BY department ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) as dept_cumulative FROM department_sales;
This gives a running total of sales for each department that starts fresh for each new department.
Q15. How would you use the LAG() and LEAD() functions to compare current row values with previous/next rows?
LAG() accesses previous rows, LEAD() accesses following rows – both are great for period-over-period comparisons:
SELECT month, revenue, LAG(revenue, 1) OVER(ORDER BY month) as prev_month, revenue - LAG(revenue, 1) OVER(ORDER BY month) as month_change, LEAD(revenue, 1) OVER(ORDER BY month) as next_month FROM monthly_revenue;
This shows each month’s revenue alongside the previous month’s and the difference between them.
Q16. Explain how to use FIRST_VALUE() and LAST_VALUE() with practical examples
These functions return values from the first/last row in the window frame:
-- Compare each employee to highest paid in department SELECT name, department, salary, FIRST_VALUE(salary) OVER( PARTITION BY department ORDER BY salary DESC ) as top_salary FROM employees; -- Compare current price to yearly low SELECT date, stock_price, FIRST_VALUE(stock_price) OVER( PARTITION BY EXTRACT(YEAR FROM date) ORDER BY stock_price ) as yearly_low, stock_price - FIRST_VALUE(stock_price) OVER( PARTITION BY EXTRACT(YEAR FROM date) ORDER BY stock_price ) as above_yearly_low FROM stock_prices;
Q17. How would you identify gaps or islands in a sequence of numbers using window functions?
“Gaps and islands” problems involve finding missing values or contiguous blocks in sequences. Here’s how to find gaps:
WITH numbered_rows AS ( SELECT id, id - ROW_NUMBER() OVER(ORDER BY id) as grp FROM sequence_table WHERE id IN (1,2,3,5,6,8,9,10) ) SELECT MIN(id) as gap_start, MAX(id) as gap_end FROM numbered_rows GROUP BY grp;
This would identify the gaps (missing 4,7) by grouping contiguous numbers.
Q18. What is the performance impact of window functions compared to equivalent queries using joins/subqueries?
Window functions are generally more efficient than equivalent joins/subqueries because:
- They scan the data only once rather than multiple times
- They avoid the overhead of joining tables to themselves
- The database can optimize the window function execution
However, complex window functions with multiple partitions can still be expensive. They work best when:
- The PARTITION BY columns are indexed
- The window frames aren’t excessively large
- You’re not using too many different window specifications in one query
Q19. How would you calculate month-over-month growth percentages using window functions?
WITH monthly_stats AS ( SELECT month, revenue, LAG(revenue, 1) OVER(ORDER BY month) as prev_revenue FROM monthly_sales ) SELECT month, revenue, ROUND( (revenue - prev_revenue) / prev_revenue * 100, 2 ) as growth_pct FROM monthly_stats WHERE prev_revenue IS NOT NULL;
This calculates the percentage change from the previous month for each month.
Q20. Write a query to find customers who haven’t placed an order in the last 30 days but had activity before that
WITH customer_last_orders AS ( SELECT customer_id, MAX(order_date) as last_order, COUNT(*) OVER() as total_customers FROM orders GROUP BY customer_id ) SELECT customer_id, last_order FROM customer_last_orders WHERE last_order < CURRENT_DATE - INTERVAL '30 days' AND last_order >= CURRENT_DATE - INTERVAL '90 days';
Q21. How would you implement a “sessionization” analysis using window functions to group user activities?
Sessionization groups user activities into sessions based on inactivity timeouts:
WITH activity_with_gaps AS ( SELECT user_id, activity_time, LAG(activity_time) OVER( PARTITION BY user_id ORDER BY activity_time ) as prev_time FROM user_activities ), session_starts AS ( SELECT user_id, activity_time, CASE WHEN prev_time IS NULL OR activity_time - prev_time > INTERVAL '30 minutes' THEN 1 ELSE 0 END as is_new_session FROM activity_with_gaps ), session_ids AS ( SELECT user_id, activity_time, SUM(is_new_session) OVER( PARTITION BY user_id ORDER BY activity_time ROWS UNBOUNDED PRECEDING ) as session_id FROM session_starts ) SELECT user_id, session_id, MIN(activity_time) as session_start, MAX(activity_time) as session_end FROM session_ids GROUP BY user_id, session_id;
Q22. Explain how to use window functions to calculate percentiles or quartiles
Use NTILE() for equal-sized buckets or PERCENT_RANK() for precise percentiles:
-- Quartiles using NTILE(4) SELECT student, test_score, NTILE(4) OVER(ORDER BY test_score) as quartile FROM test_scores; -- Exact percentiles SELECT employee, salary, PERCENT_RANK() OVER(ORDER BY salary) as percentile FROM employees; -- Median using PERCENTILE_CONT SELECT department, PERCENTILE_CONT(0.5) WITHIN GROUP(ORDER BY salary) as median_salary FROM employees GROUP BY department;
Q23. What indexing strategies work best with window functions?
Effective indexing for window functions focuses on the PARTITION BY and ORDER BY clauses:
- Create composite indexes on (partition_columns, order_columns)
- For RANGE frames, ensure the ORDER BY column is indexed
- Consider covering indexes if you’re filtering results
-- For this query: SELECT department, salary, RANK() OVER(PARTITION BY department ORDER BY salary) FROM employees; -- This index would help: CREATE INDEX idx_dept_salary ON employees(department, salary);
Q24. How can you optimize a query that uses multiple window functions with different PARTITION BY clauses?
When you have multiple window functions with different partitions:
- Consider rewriting as separate CTEs then joining them
- Create separate indexes for each partition scheme
- Use MATERIALIZED hints for CTEs if supported
- Filter early to reduce the working set size
-- Instead of: SELECT id, SUM(x) OVER(PARTITION BY a), AVG(y) OVER(PARTITION BY b), COUNT(*) OVER(PARTITION BY c) FROM table; -- Try: WITH a_groups AS ( SELECT id, SUM(x) OVER(PARTITION BY a) as sum_a FROM table ), b_groups AS ( SELECT id, AVG(y) OVER(PARTITION BY b) as avg_b FROM table ) SELECT t.id, a.sum_a, b.avg_b FROM table t JOIN a_groups a ON t.id = a.id JOIN b_groups b ON t.id = b.id;
Q25. When would you avoid using window functions for performance reasons?
Avoid window functions when:
- The window frames are extremely large (scanning most of the table)
- You have many different PARTITION BY clauses in one query
- You’re working with small datasets where simple joins would suffice
- Your database has poor window function optimization (some older systems)
- You need to distribute the computation (some distributed systems handle window functions poorly)
Q26. Running Sum: Write a query to calculate the running sum of salaries for employees in each department, ordered by hire date.
SELECT
EMPNO, ENAME, DEPTNO, SAL, HIREDATE,
SUM(SAL) OVER (PARTITION BY DEPTNO ORDER BY HIREDATE ROWS UNBOUNDED PRECEDING) AS running_sum
FROM EMP
ORDER BY DEPTNO, HIREDA

Q27. Top-N per Group: Retrieve the top 2 earners in each department along with their salaries.
WITH top_earners AS ( SELECT EMPNO, ENAME, DEPTNO, SAL, DENSE_RANK() OVER (PARTITION BY DEPTNO ORDER BY SAL DESC) AS salary_rank FROM EMP ) SELECT EMPNO, ENAME, DEPTNO, SAL FROM top_earners WHERE salary_rank <= 2 ORDER BY DEPTNO, salary_rank;
Q28. Second Last Hire: For each employee, find the date of their second-to-last hire using LEAD()/LAG().
WITH hire_order AS ( SELECT EMPNO, ENAME, HIREDATE, DEPTNO, LAG(HIREDATE, 1) OVER (PARTITION BY DEPTNO ORDER BY HIREDATE DESC) AS second_last_hire FROM EMP ) SELECT EMPNO, ENAME, DEPTNO, HIREDATE FROM hire_order WHERE second_last_hire IS NULL;
Q29. Compared to Nth Value: Show employees who earn more than the 2nd highest salary in their department.
WITH dept_ranked AS ( SELECT EMPNO, ENAME, DEPTNO, SAL, DENSE_RANK() OVER (PARTITION BY DEPTNO ORDER BY SAL DESC) AS rank FROM EMP ), second_highest AS ( SELECT DEPTNO, SAL FROM dept_ranked WHERE rank = 2 ) SELECT e.EMPNO, e.ENAME, e.DEPTNO, e.SAL FROM EMP e JOIN second_highest s ON e.DEPTNO = s.DEPTNO WHERE e.SAL > s.SAL;
Q30. Gaps in Sequences Detection: Identify missing invoice numbers in a table (e.g., 100, 101, 103, 104,107,110 → gap at 102, gap at 105-106, gap at 108-109).
Dataset –
100, ‘Alice Smith’, 150.00,
101, ‘Bob Johnson’, 200.00,
103, ‘Charlie Brown’, 175.50, — Gap: 102 is missing
104, ‘Diana Prince’, 300.00,
107, ‘Edward Norton’, 225.75, — Gaps: 105-106 missing
110, ‘Fiona Green’, 180.00 — Gaps: 108-109 missing
WITH numbered_invoices AS ( SELECT invoice_num, LEAD(invoice_num) OVER (ORDER BY invoice_num) AS next_num FROM invoices ) SELECT invoice_num + 1 AS gap_start, next_num - 1 AS gap_end FROM numbered_invoices WHERE next_num - invoice_num > 1;
- WITH numbered_invoices AS (
SELECT invoice_num, LEAD(invoice_num) OVER (ORDER BY invoice_num) AS next_num FROM invoices
)- This creates a temporary result set where each row contains:
invoice_num
: The current invoice numbernext_num
: The next invoice number in sequence (using LEAD())
- This creates a temporary result set where each row contains:
- Intermediate Results:
After the CTE executes, it would look like this:
invoice_num next_num
100 101
101 103
103 104
104 107
107 110
110 NULL - Main Query Logic:
SELECT invoice_num + 1 AS gap_start, next_num – 1 AS gap_end
FROM numbered_invoices
WHERE next_num – invoice_num > 1;
- For each row, it calculates:
gap_start
: current invoice + 1 (first missing number)gap_end
: next invoice – 1 (last missing number)
- The WHERE clause only includes rows where there’s a gap (>1 difference)
- For each row, it calculates:
- Final Output:
The query returns:
gap_start gap_end
102 102(Only 102 is missing between 101 and 103)
105 106(105 and 106 missing between 104 and 107)
108 109(108 and 109 missing between 107 and 110)
Q31. Consecutive Days: Find users who logged in for 3 or more consecutive days
WITH login_dates AS ( SELECT user_id, login_date, LAG(login_date, 2) OVER (PARTITION BY user_id ORDER BY login_date) AS two_days_prior FROM user_logins ) SELECT DISTINCT user_id FROM login_dates WHERE login_date = two_days_prior + INTERVAL '2 days';
Q32. Previous/Next Row Comparison: Compare each month’s sales to previous month
SELECT month, sales, LAG(sales, 1) OVER (ORDER BY month) AS prev_month_sales, sales - LAG(sales, 1) OVER (ORDER BY month) AS month_over_month_change, (sales - LAG(sales, 1) OVER (ORDER BY month)) / LAG(sales, 1) OVER (ORDER BY month) * 100 AS growth_percentage FROM monthly_sales;
Q33. First and Last in Group: For each department, display the first and last hired employee’s name and hire date.
WITH department_employees AS ( SELECT department_id, employee_name, hire_date, FIRST_VALUE(employee_name) OVER (PARTITION BY department_id ORDER BY hire_date) AS first_employee, FIRST_VALUE(hire_date) OVER (PARTITION BY department_id ORDER BY hire_date) AS first_hire_date, LAST_VALUE(employee_name) OVER (PARTITION BY department_id ORDER BY hire_date RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS last_employee, LAST_VALUE(hire_date) OVER (PARTITION BY department_id ORDER BY hire_date RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS last_hire_date, ROW_NUMBER() OVER (PARTITION BY department_id ORDER BY hire_date) AS rn FROM employees ) SELECT department_id, first_employee, first_hire_date, last_employee, last_hire_date FROM department_employees WHERE rn = 1;
Q34. Cumulative Sum Reset: Running total of sales that resets each year
SELECT date, sales, SUM(sales) OVER (PARTITION BY EXTRACT(YEAR FROM date) ORDER BY date) AS yearly_running_total FROM daily_sales;
Q35. Customer Retention: Identify customers who made a purchase in the current month but not in the previous month
WITH customer_monthly_activity AS ( SELECT customer_id, DATE_TRUNC('month', order_date) AS month, LAG(DATE_TRUNC('month', order_date)) OVER (PARTITION BY customer_id ORDER BY DATE_TRUNC('month', order_date)) AS prev_month FROM orders ) SELECT customer_id, month AS current_month FROM customer_monthly_activity WHERE DATE_TRUNC('month', month) = DATE_TRUNC('month', CURRENT_DATE) AND (prev_month IS NULL OR prev_month < month - INTERVAL '1 month');
Q36. Sessionization: Group user activity logs into sessions (if inactivity > 30 mins, start a new session)
WITH activity_with_gaps AS ( SELECT user_id, activity_time, LAG(activity_time) OVER (PARTITION BY user_id ORDER BY activity_time) AS prev_activity_time, CASE WHEN EXTRACT(EPOCH FROM (activity_time - LAG(activity_time) OVER (PARTITION BY user_id ORDER BY activity_time))) / 60 > 30 THEN 1 ELSE 0 END AS new_session_flag FROM user_activity ), session_groups AS ( SELECT user_id, activity_time, SUM(new_session_flag) OVER (PARTITION BY user_id ORDER BY activity_time) AS session_id FROM activity_with_gaps ) SELECT user_id, session_id, MIN(activity_time) AS session_start, MAX(activity_time) AS session_end, COUNT(*) AS activities_in_session FROM session_groups GROUP BY user_id, session_id;
Q37. Pareto Analysis (80/20 Rule): Find the top 20% of customers contributing to 80% of sales
WITH customer_sales AS ( SELECT customer_id, SUM(order_amount) AS total_sales, SUM(SUM(order_amount)) OVER () AS grand_total, ROW_NUMBER() OVER (ORDER BY SUM(order_amount) DESC) AS rank FROM orders GROUP BY customer_id ), running_totals AS ( SELECT customer_id, total_sales, SUM(total_sales) OVER (ORDER BY total_sales DESC) AS running_sum, grand_total, running_sum / grand_total AS running_percentage FROM customer_sales ) SELECT customer_id, total_sales, running_percentage FROM running_totals WHERE running_percentage <= 0.8 ORDER BY total_sales DESC;
Q38. Time Between Events: Calculate the average time between orders for each customer
WITH order_times AS ( SELECT customer_id, order_date, LEAD(order_date) OVER (PARTITION BY customer_id ORDER BY order_date) AS next_order_date FROM orders ), time_differences AS ( SELECT customer_id, EXTRACT(EPOCH FROM (next_order_date - order_date)) / 86400 AS days_between_orders FROM order_times WHERE next_order_date IS NOT NULL ) SELECT customer_id, AVG(days_between_orders) AS avg_days_between_orders FROM time_differences GROUP BY customer_id;
Q39. Partitioned Running Count: For each product category, show a running count of orders by date
SELECT order_date, product_category, COUNT(*) AS daily_orders, SUM(COUNT(*)) OVER (PARTITION BY product_category ORDER BY order_date) AS running_category_orders FROM orders GROUP BY order_date, product_category ORDER BY product_category, order_date;
Q40. Alternate Row Selection: Select every 2nd row from a result set
WITH numbered_rows AS ( SELECT *, ROW_NUMBER() OVER (ORDER BY id) AS row_num FROM my_table ) SELECT * FROM numbered_rows WHERE MOD(row_num, 2) = 0;
Q41. Deduplication: Remove duplicate rows (keeping one copy)
WITH duplicates AS ( SELECT *, ROW_NUMBER() OVER (PARTITION BY column1, column2, column3 ORDER BY id) AS row_num FROM my_table ) DELETE FROM my_table WHERE id IN ( SELECT id FROM duplicates WHERE row_num > 1 );
Q42. Hierarchical Running Sum: Compute a running total that respects a hierarchy (e.g., sum sales by region → country → city)
SELECT region, country, city, sales_date, sales_amount, SUM(sales_amount) OVER (PARTITION BY region, country, city ORDER BY sales_date) AS city_running_total, SUM(sales_amount) OVER (PARTITION BY region, country ORDER BY sales_date) AS country_running_total, SUM(sales_amount) OVER (PARTITION BY region ORDER BY sales_date) AS region_running_total FROM sales_data ORDER BY region, country, city, sales_date;
Q43. Conditional Running Sum: Calculate a running sum that only includes values meeting a condition (e.g., sum sales where amount > 100)
SELECT date, amount, SUM(CASE WHEN amount > 100 THEN amount ELSE 0 END) OVER (ORDER BY date) AS conditional_running_sum FROM transactions;
Q44. Dynamic Window Sizes: Write a query where the window frame size varies by row (e.g., 7-day average for weekdays, 3-day for weekends)
SELECT
date, sales, CASE WHEN EXTRACT(DOW FROM date) BETWEEN 1 AND 5 THEN — Weekday
AVG(sales) OVER (ORDER BY date ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING)
ELSE — Weekend
AVG(sales) OVER (ORDER BY date ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING)
END AS dynamic_avg
FROM daily_sales;
If you’re looking to master window functions in Snowflake SQL, the Snowflake Window Functions Documentation provides an excellent reference guide