Master SQL Analytical Functions Interview Questions and Answers -1

Analytical Functions and Their Usage

Analytical functions are powerful SQL tools that enable complex calculations while preserving row-level detail. They shine in scenarios requiring:

  • Rankings and percentiles (top N per group, quartiles)
  • Moving calculations (averages, sums)
  • Period-over-period comparisons (monthly growth)
  • Sessionization and sequence analysis
  • Relative calculations (differences from averages)

The key advantage is performing these operations in a single pass through the data while maintaining access to all row details. They eliminate the need for many self-joins and correlated subqueries that were traditionally used for such calculations.
Master SQL Analytical Functions Interview Questions and Answers

Proper use of window functions can make queries both more readable and more efficient. However, they work best when you understand the window frame specifications and partition carefully to avoid excessive memory usage. For large datasets, pay special attention to the ORDER BY clauses in your OVER() specifications, as these often require sorting operations that can be expensive.

Real-world applications include financial analysis (running balances), operational metrics (rolling averages), customer behavior analysis (session grouping), and performance benchmarking (department rankings). Mastering window functions can elevate your SQL from simple data retrieval to sophisticated analytical processing.

Q1. What are analytical/window functions in SQL and how do they differ from aggregate functions?

Analytical functions, also called window functions, perform calculations across a set of table rows related to the current row. Unlike aggregate functions that collapse multiple rows into a single result, window functions maintain individual rows while adding computed values.

The key difference is that aggregate functions (like SUM, COUNT, AVG) group data and return one result per group, while window functions can return multiple results – one for each row in the original dataset.

-- Aggregate function (collapses rows)
SELECT department, AVG(salary) 
FROM employees 
GROUP BY department;

-- Window function (preserves rows)
SELECT name, department, salary, 
       AVG(salary) OVER(PARTITION BY department) as avg_dept_salary
FROM employees;

Q2. Explain the OVER() clause in analytical functions and its significance

The OVER() clause is what defines the “window” or set of rows that the analytical function operates on. It’s the heart of window functions, specifying how to partition and order the data before applying the function.

Without OVER(), functions like RANK() or AVG() wouldn’t know how to behave as window functions. The clause can be empty (OVER()) to operate on the entire result set, or contain PARTITION BY and ORDER BY specifications.

-- Simple OVER() using entire result set
SELECT name, salary,
       AVG(salary) OVER() as company_avg
FROM employees;

-- With PARTITION BY to create department windows
SELECT name, department, salary,
       AVG(salary) OVER(PARTITION BY department) as dept_avg
FROM employees;

Q3. What are the three main components of a window function syntax?

Window functions have three key parts:

  1. The function itself: Like RANK(), SUM(), LEAD(), etc.
  2. The OVER() clause: Defines the window frame
  3. Window specifications inside OVER():
    • PARTITION BY (groups rows into partitions)
    • ORDER BY (orders rows within partitions)
    • Window frame (ROWS/RANGE between boundaries)
SELECT employee_id, department, salary,
       -- The three components:
       SUM(salary) OVER(                -- 1. Function
           PARTITION BY department      -- 2. Partition
           ORDER BY hire_date          -- 3. Order
           ROWS BETWEEN 1 PRECEDING    -- 4. Frame
               AND 1 FOLLOWING
       ) as moving_sum
FROM employees;

Q4. How do window functions differ from GROUP BY operations?

Window functions and GROUP BY both work with groups of data, but in fundamentally different ways:

  • GROUP BY collapses each group into a single row in the output
  • Window functions preserve all original rows while adding computed values
  • With GROUP BY, you can only select grouped columns or aggregates
  • With window functions, you can select all individual row data plus computed values
-- GROUP BY approach (loses individual row details)
SELECT department, COUNT(*) as emp_count
FROM employees
GROUP BY department;

-- Window function approach (keeps all rows)
SELECT name, department, 
       COUNT(*) OVER(PARTITION BY department) as dept_count
FROM employees;

Q5. Explain the difference between ROW_NUMBER(), RANK(), and DENSE_RANK() with examples

These ranking functions all assign numbers to rows, but handle ties differently:

  • ROW_NUMBER(): Always gives unique numbers (1,2,3), even for ties
  • RANK(): Gives same number for ties, then skips next numbers (1,2,2,4)
  • DENSE_RANK(): Gives same number for ties, but doesn’t skip (1,2,2,3)
SELECT name, salary,
       ROW_NUMBER() OVER(ORDER BY salary DESC) as row_num,
       RANK() OVER(ORDER BY salary DESC) as rank,
       DENSE_RANK() OVER(ORDER BY salary DESC) as dense_rank
FROM employees;

For salaries [10000, 9000, 9000, 8000], this would produce:

  • ROW_NUMBER: 1,2,3,4
  • RANK: 1,2,2,4
  • DENSE_RANK: 1,2,2,3

Q6. When would you use NTILE() function? Provide a practical use case

NTILE() divides rows into roughly equal buckets. It’s useful when you need to segment data into percentiles, quartiles, or other equal groups.

Practical use case: Dividing customers into spending tiers (high, medium, low) based on their purchase amounts.

-- Divide products into 4 price quartiles
SELECT product_name, price,
       NTILE(4) OVER(ORDER BY price) as price_quartile
FROM products;

-- Segment customers into 3 tiers based on yearly spending
SELECT customer_id, yearly_spend,
       NTILE(3) OVER(ORDER BY yearly_spend DESC) as spending_tier
FROM customers;

Q7. How would you find the top 3 highest salaries in each department using window functions?

We can use RANK() or DENSE_RANK() partitioned by department and ordered by salary:

WITH ranked_salaries AS (
    SELECT name, department, salary,
           DENSE_RANK() OVER(
               PARTITION BY department 
               ORDER BY salary DESC
           ) as salary_rank
    FROM employees
)
SELECT name, department, salary
FROM ranked_salaries
WHERE salary_rank <= 3;

This query first assigns ranks within each department, then filters to only include ranks 1-3.

Q8. How do you calculate a running total using window functions?

Running totals are perfect for window functions. Use SUM() with ORDER BY in the OVER clause:

SELECT date, sales,
       SUM(sales) OVER(
           ORDER BY date
           ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
       ) as running_total
FROM daily_sales;

The window frame “UNBOUNDED PRECEDING AND CURRENT ROW” means “all rows from the start up to current row”, which creates the running total effect.

Q9. Explain how to calculate moving averages with window functions

Moving averages smooth out data by averaging values within a sliding window. Specify the window frame with ROWS or RANGE:

-- 7-day moving average
SELECT date, temperature,
       AVG(temperature) OVER(
           ORDER BY date
           ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING
       ) as moving_avg_7day
FROM weather_data;

-- 30-day centered moving average
SELECT date, stock_price,
       AVG(stock_price) OVER(
           ORDER BY date
           ROWS BETWEEN 15 PRECEDING AND 15 FOLLOWING
       ) as moving_avg_30day
FROM stock_prices;

Q10. How would you compare each employee’s salary to the average salary in their department?

This is a classic PARTITION BY use case – we create department windows and calculate averages within them:

SELECT name, department, salary,
       AVG(salary) OVER(PARTITION BY department) as dept_avg,
       salary - AVG(salary) OVER(PARTITION BY department) as diff_from_avg
FROM employees;

This shows each employee’s salary alongside their department average and the difference between them.

Q11. What is the difference between ROWS and RANGE in window frame specifications?

ROWS and RANGE both define window frames but work differently:

  • ROWS counts physical rows before/after current row
  • RANGE includes rows based on logical value ranges
-- ROWS: Exactly 2 rows before current
SELECT date, amount,
       SUM(amount) OVER(
           ORDER BY date
           ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
       ) as sum_3rows
FROM transactions;

-- RANGE: All rows with dates within 2 days of current
SELECT date, amount,
       SUM(amount) OVER(
           ORDER BY date
           RANGE BETWEEN INTERVAL '2' DAY PRECEDING AND CURRENT ROW
       ) as sum_3days
FROM transactions;

Q12. How would you write a query to get a 3-day moving average of sales?

SELECT sale_date, amount,
       AVG(amount) OVER(
           ORDER BY sale_date
           ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING
       ) as moving_avg_3day
FROM daily_sales;

This averages each day with its immediate neighbor before and after. For edge cases (first/last day), it will automatically adjust the window.

Q13. Explain the UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING keywords

These keywords define window frame boundaries:

  • UNBOUNDED PRECEDING: Start from the first row in the partition
  • UNBOUNDED FOLLOWING: Continue to the last row in the partition
-- Running total from partition start to current row
SELECT date, revenue,
       SUM(revenue) OVER(
           PARTITION BY region
           ORDER BY date
           ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
       ) as running_total
FROM sales;

-- Total for entire partition (current and all following rows)
SELECT employee, month, sales,
       SUM(sales) OVER(
           PARTITION BY employee
           ORDER BY month
           ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
       ) as remaining_total
FROM employee_sales;

Q14. How do you create a cumulative sum that resets for each group?

Use PARTITION BY to define the groups where the sum should reset, then ORDER BY with UNBOUNDED PRECEDING for the cumulative behavior:

SELECT date, department, sales,
       SUM(sales) OVER(
           PARTITION BY department
           ORDER BY date
           ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
       ) as dept_cumulative
FROM department_sales;

This gives a running total of sales for each department that starts fresh for each new department.

Q15. How would you use the LAG() and LEAD() functions to compare current row values with previous/next rows?

LAG() accesses previous rows, LEAD() accesses following rows – both are great for period-over-period comparisons:

SELECT month, revenue,
       LAG(revenue, 1) OVER(ORDER BY month) as prev_month,
       revenue - LAG(revenue, 1) OVER(ORDER BY month) as month_change,
       LEAD(revenue, 1) OVER(ORDER BY month) as next_month
FROM monthly_revenue;

This shows each month’s revenue alongside the previous month’s and the difference between them.

Q16. Explain how to use FIRST_VALUE() and LAST_VALUE() with practical examples

These functions return values from the first/last row in the window frame:

-- Compare each employee to highest paid in department
SELECT name, department, salary,
       FIRST_VALUE(salary) OVER(
           PARTITION BY department
           ORDER BY salary DESC
       ) as top_salary
FROM employees;

-- Compare current price to yearly low
SELECT date, stock_price,
       FIRST_VALUE(stock_price) OVER(
           PARTITION BY EXTRACT(YEAR FROM date)
           ORDER BY stock_price
       ) as yearly_low,
       stock_price - FIRST_VALUE(stock_price) OVER(
           PARTITION BY EXTRACT(YEAR FROM date)
           ORDER BY stock_price
       ) as above_yearly_low
FROM stock_prices;

Q17. How would you identify gaps or islands in a sequence of numbers using window functions?

“Gaps and islands” problems involve finding missing values or contiguous blocks in sequences. Here’s how to find gaps:

WITH numbered_rows AS (
    SELECT id,
           id - ROW_NUMBER() OVER(ORDER BY id) as grp
    FROM sequence_table
    WHERE id IN (1,2,3,5,6,8,9,10)
)
SELECT MIN(id) as gap_start, MAX(id) as gap_end
FROM numbered_rows
GROUP BY grp;

This would identify the gaps (missing 4,7) by grouping contiguous numbers.

Q18. What is the performance impact of window functions compared to equivalent queries using joins/subqueries?

Window functions are generally more efficient than equivalent joins/subqueries because:

  1. They scan the data only once rather than multiple times
  2. They avoid the overhead of joining tables to themselves
  3. The database can optimize the window function execution

However, complex window functions with multiple partitions can still be expensive. They work best when:

  • The PARTITION BY columns are indexed
  • The window frames aren’t excessively large
  • You’re not using too many different window specifications in one query

Q19. How would you calculate month-over-month growth percentages using window functions?

WITH monthly_stats AS (
    SELECT month, revenue,
           LAG(revenue, 1) OVER(ORDER BY month) as prev_revenue
    FROM monthly_sales
)
SELECT month, revenue,
       ROUND(
           (revenue - prev_revenue) / prev_revenue * 100, 
           2
       ) as growth_pct
FROM monthly_stats
WHERE prev_revenue IS NOT NULL;

This calculates the percentage change from the previous month for each month.

Q20. Write a query to find customers who haven’t placed an order in the last 30 days but had activity before that

WITH customer_last_orders AS (
    SELECT customer_id, MAX(order_date) as last_order,
           COUNT(*) OVER() as total_customers
    FROM orders
    GROUP BY customer_id
)
SELECT customer_id, last_order
FROM customer_last_orders
WHERE last_order < CURRENT_DATE - INTERVAL '30 days'
  AND last_order >= CURRENT_DATE - INTERVAL '90 days';

Q21. How would you implement a “sessionization” analysis using window functions to group user activities?

Sessionization groups user activities into sessions based on inactivity timeouts:

WITH activity_with_gaps AS (
    SELECT user_id, activity_time,
           LAG(activity_time) OVER(
               PARTITION BY user_id
               ORDER BY activity_time
           ) as prev_time
    FROM user_activities
),
session_starts AS (
    SELECT user_id, activity_time,
           CASE WHEN prev_time IS NULL 
                 OR activity_time - prev_time > INTERVAL '30 minutes'
                THEN 1 ELSE 0 
           END as is_new_session
    FROM activity_with_gaps
),
session_ids AS (
    SELECT user_id, activity_time,
           SUM(is_new_session) OVER(
               PARTITION BY user_id
               ORDER BY activity_time
               ROWS UNBOUNDED PRECEDING
           ) as session_id
    FROM session_starts
)
SELECT user_id, session_id, 
       MIN(activity_time) as session_start,
       MAX(activity_time) as session_end
FROM session_ids
GROUP BY user_id, session_id;

Q22. Explain how to use window functions to calculate percentiles or quartiles

Use NTILE() for equal-sized buckets or PERCENT_RANK() for precise percentiles:

-- Quartiles using NTILE(4)
SELECT student, test_score,
       NTILE(4) OVER(ORDER BY test_score) as quartile
FROM test_scores;

-- Exact percentiles
SELECT employee, salary,
       PERCENT_RANK() OVER(ORDER BY salary) as percentile
FROM employees;

-- Median using PERCENTILE_CONT
SELECT department,
       PERCENTILE_CONT(0.5) WITHIN GROUP(ORDER BY salary) as median_salary
FROM employees
GROUP BY department;

Q23. What indexing strategies work best with window functions?

Effective indexing for window functions focuses on the PARTITION BY and ORDER BY clauses:

  1. Create composite indexes on (partition_columns, order_columns)
  2. For RANGE frames, ensure the ORDER BY column is indexed
  3. Consider covering indexes if you’re filtering results
-- For this query:
SELECT department, salary,
       RANK() OVER(PARTITION BY department ORDER BY salary)
FROM employees;

-- This index would help:
CREATE INDEX idx_dept_salary ON employees(department, salary);

Q24. How can you optimize a query that uses multiple window functions with different PARTITION BY clauses?

When you have multiple window functions with different partitions:

  1. Consider rewriting as separate CTEs then joining them
  2. Create separate indexes for each partition scheme
  3. Use MATERIALIZED hints for CTEs if supported
  4. Filter early to reduce the working set size
-- Instead of:
SELECT id,
       SUM(x) OVER(PARTITION BY a),
       AVG(y) OVER(PARTITION BY b),
       COUNT(*) OVER(PARTITION BY c)
FROM table;

-- Try:
WITH a_groups AS (
    SELECT id, SUM(x) OVER(PARTITION BY a) as sum_a
    FROM table
),
b_groups AS (
    SELECT id, AVG(y) OVER(PARTITION BY b) as avg_b
    FROM table
)
SELECT t.id, a.sum_a, b.avg_b
FROM table t
JOIN a_groups a ON t.id = a.id
JOIN b_groups b ON t.id = b.id;

Q25. When would you avoid using window functions for performance reasons?

Avoid window functions when:

  1. The window frames are extremely large (scanning most of the table)
  2. You have many different PARTITION BY clauses in one query
  3. You’re working with small datasets where simple joins would suffice
  4. Your database has poor window function optimization (some older systems)
  5. You need to distribute the computation (some distributed systems handle window functions poorly)

Q26. Running Sum: Write a query to calculate the running sum of salaries for employees in each department, ordered by hire date.

SELECT 
EMPNO, ENAME, DEPTNO, SAL, HIREDATE,
SUM(SAL) OVER (PARTITION BY DEPTNO ORDER BY HIREDATE ROWS UNBOUNDED PRECEDING) AS running_sum
FROM EMP
ORDER BY DEPTNO, HIREDA

Q27. Top-N per Group: Retrieve the top 2 earners in each department along with their salaries.

WITH top_earners AS (
    SELECT 
        EMPNO, ENAME, DEPTNO, SAL,
        DENSE_RANK() OVER (PARTITION BY DEPTNO ORDER BY SAL DESC) AS salary_rank
    FROM EMP
)
SELECT EMPNO, ENAME, DEPTNO, SAL
FROM top_earners
WHERE salary_rank <= 2
ORDER BY DEPTNO, salary_rank;

Q28. Second Last Hire: For each employee, find the date of their second-to-last hire using LEAD()/LAG().

WITH hire_order AS (
    SELECT 
        EMPNO, ENAME, HIREDATE, DEPTNO,
        LAG(HIREDATE, 1) OVER (PARTITION BY DEPTNO ORDER BY HIREDATE DESC) AS second_last_hire
    FROM EMP
)
SELECT EMPNO, ENAME, DEPTNO, HIREDATE
FROM hire_order
WHERE second_last_hire IS NULL;

Q29. Compared to Nth Value: Show employees who earn more than the 2nd highest salary in their department.

WITH dept_ranked AS (
    SELECT 
        EMPNO, ENAME, DEPTNO, SAL,
        DENSE_RANK() OVER (PARTITION BY DEPTNO ORDER BY SAL DESC) AS rank
    FROM EMP
),
second_highest AS (
    SELECT DEPTNO, SAL
    FROM dept_ranked
    WHERE rank = 2
)
SELECT e.EMPNO, e.ENAME, e.DEPTNO, e.SAL
FROM EMP e
JOIN second_highest s ON e.DEPTNO = s.DEPTNO
WHERE e.SAL > s.SAL;

Q30. Gaps in Sequences Detection: Identify missing invoice numbers in a table (e.g., 100, 101, 103, 104,107,110 → gap at 102, gap at 105-106, gap at 108-109).

Dataset –
100, ‘Alice Smith’, 150.00,
101, ‘Bob Johnson’, 200.00,
103, ‘Charlie Brown’, 175.50, — Gap: 102 is missing
104, ‘Diana Prince’, 300.00,
107, ‘Edward Norton’, 225.75, — Gaps: 105-106 missing
110, ‘Fiona Green’, 180.00 — Gaps: 108-109 missing

WITH numbered_invoices AS (
    SELECT 
        invoice_num,
        LEAD(invoice_num) OVER (ORDER BY invoice_num) AS next_num
    FROM invoices
)
SELECT invoice_num + 1 AS gap_start,
       next_num - 1 AS gap_end
FROM numbered_invoices
WHERE next_num - invoice_num > 1;
  1. WITH numbered_invoices AS (
    SELECT invoice_num, LEAD(invoice_num) OVER (ORDER BY invoice_num) AS next_num FROM invoices
    )
    • This creates a temporary result set where each row contains:
      • invoice_num: The current invoice number
      • next_num: The next invoice number in sequence (using LEAD())
  2. Intermediate Results:
    After the CTE executes, it would look like this:
    invoice_num next_num
    100 101
    101 103
    103 104
    104 107
    107 110
    110 NULL
  3. Main Query Logic:
    SELECT invoice_num + 1 AS gap_start, next_num – 1 AS gap_end
    FROM numbered_invoices
    WHERE next_num – invoice_num > 1;

    • For each row, it calculates:
      • gap_start: current invoice + 1 (first missing number)
      • gap_end: next invoice – 1 (last missing number)
    • The WHERE clause only includes rows where there’s a gap (>1 difference)
  4. Final Output:
    The query returns:
    gap_start gap_end
    102 102(Only 102 is missing between 101 and 103)
    105 106(105 and 106 missing between 104 and 107)
    108 109(108 and 109 missing between 107 and 110)

Q31. Consecutive Days: Find users who logged in for 3 or more consecutive days

WITH login_dates AS (
  SELECT 
    user_id,
    login_date,
    LAG(login_date, 2) OVER (PARTITION BY user_id ORDER BY login_date) AS two_days_prior
  FROM user_logins
)
SELECT DISTINCT user_id
FROM login_dates
WHERE login_date = two_days_prior + INTERVAL '2 days';

Q32. Previous/Next Row Comparison: Compare each month’s sales to previous month

SELECT 
  month,
  sales,
  LAG(sales, 1) OVER (ORDER BY month) AS prev_month_sales,
  sales - LAG(sales, 1) OVER (ORDER BY month) AS month_over_month_change,
  (sales - LAG(sales, 1) OVER (ORDER BY month)) / LAG(sales, 1) OVER (ORDER BY month) * 100 AS growth_percentage
FROM monthly_sales;

Q33. First and Last in Group: For each department, display the first and last hired employee’s name and hire date.

WITH department_employees AS (
  SELECT 
    department_id,
    employee_name,
    hire_date,
    FIRST_VALUE(employee_name) OVER (PARTITION BY department_id ORDER BY hire_date) AS first_employee,
    FIRST_VALUE(hire_date) OVER (PARTITION BY department_id ORDER BY hire_date) AS first_hire_date,
    LAST_VALUE(employee_name) OVER (PARTITION BY department_id ORDER BY hire_date 
      RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS last_employee,
    LAST_VALUE(hire_date) OVER (PARTITION BY department_id ORDER BY hire_date 
      RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS last_hire_date,
    ROW_NUMBER() OVER (PARTITION BY department_id ORDER BY hire_date) AS rn
  FROM employees
)
SELECT 
  department_id,
  first_employee,
  first_hire_date,
  last_employee,
  last_hire_date
FROM department_employees
WHERE rn = 1;

Q34. Cumulative Sum Reset: Running total of sales that resets each year

SELECT 
  date,
  sales,
  SUM(sales) OVER (PARTITION BY EXTRACT(YEAR FROM date) ORDER BY date) AS yearly_running_total
FROM daily_sales;

Q35. Customer Retention: Identify customers who made a purchase in the current month but not in the previous month

WITH customer_monthly_activity AS (
  SELECT 
    customer_id,
    DATE_TRUNC('month', order_date) AS month,
    LAG(DATE_TRUNC('month', order_date)) OVER (PARTITION BY customer_id ORDER BY DATE_TRUNC('month', order_date)) AS prev_month
  FROM orders
)
SELECT 
  customer_id,
  month AS current_month
FROM customer_monthly_activity
WHERE DATE_TRUNC('month', month) = DATE_TRUNC('month', CURRENT_DATE)
  AND (prev_month IS NULL OR prev_month < month - INTERVAL '1 month');

Q36. Sessionization: Group user activity logs into sessions (if inactivity > 30 mins, start a new session)

WITH activity_with_gaps AS (
  SELECT 
    user_id,
    activity_time,
    LAG(activity_time) OVER (PARTITION BY user_id ORDER BY activity_time) AS prev_activity_time,
    CASE 
      WHEN EXTRACT(EPOCH FROM (activity_time - LAG(activity_time) OVER (PARTITION BY user_id ORDER BY activity_time))) / 60 > 30 
      THEN 1 
      ELSE 0 
    END AS new_session_flag
  FROM user_activity
),
session_groups AS (
  SELECT 
    user_id,
    activity_time,
    SUM(new_session_flag) OVER (PARTITION BY user_id ORDER BY activity_time) AS session_id
  FROM activity_with_gaps
)
SELECT 
  user_id,
  session_id,
  MIN(activity_time) AS session_start,
  MAX(activity_time) AS session_end,
  COUNT(*) AS activities_in_session
FROM session_groups
GROUP BY user_id, session_id;

Q37. Pareto Analysis (80/20 Rule): Find the top 20% of customers contributing to 80% of sales

WITH customer_sales AS (
  SELECT 
    customer_id,
    SUM(order_amount) AS total_sales,
    SUM(SUM(order_amount)) OVER () AS grand_total,
    ROW_NUMBER() OVER (ORDER BY SUM(order_amount) DESC) AS rank
  FROM orders
  GROUP BY customer_id
),
running_totals AS (
  SELECT 
    customer_id,
    total_sales,
    SUM(total_sales) OVER (ORDER BY total_sales DESC) AS running_sum,
    grand_total,
    running_sum / grand_total AS running_percentage
  FROM customer_sales
)
SELECT 
  customer_id,
  total_sales,
  running_percentage
FROM running_totals
WHERE running_percentage <= 0.8
ORDER BY total_sales DESC;

Q38. Time Between Events: Calculate the average time between orders for each customer

WITH order_times AS (
  SELECT 
    customer_id,
    order_date,
    LEAD(order_date) OVER (PARTITION BY customer_id ORDER BY order_date) AS next_order_date
  FROM orders
),
time_differences AS (
  SELECT 
    customer_id,
    EXTRACT(EPOCH FROM (next_order_date - order_date)) / 86400 AS days_between_orders
  FROM order_times
  WHERE next_order_date IS NOT NULL
)
SELECT 
  customer_id,
  AVG(days_between_orders) AS avg_days_between_orders
FROM time_differences
GROUP BY customer_id;

Q39. Partitioned Running Count: For each product category, show a running count of orders by date

SELECT 
  order_date,
  product_category,
  COUNT(*) AS daily_orders,
  SUM(COUNT(*)) OVER (PARTITION BY product_category ORDER BY order_date) AS running_category_orders
FROM orders
GROUP BY order_date, product_category
ORDER BY product_category, order_date;

Q40. Alternate Row Selection: Select every 2nd row from a result set

WITH numbered_rows AS (
  SELECT 
    *,
    ROW_NUMBER() OVER (ORDER BY id) AS row_num
  FROM my_table
)
SELECT * 
FROM numbered_rows
WHERE MOD(row_num, 2) = 0;

Q41. Deduplication: Remove duplicate rows (keeping one copy)

WITH duplicates AS (
  SELECT 
    *,
    ROW_NUMBER() OVER (PARTITION BY column1, column2, column3 ORDER BY id) AS row_num
  FROM my_table
)
DELETE FROM my_table
WHERE id IN (
  SELECT id FROM duplicates WHERE row_num > 1
);

Q42. Hierarchical Running Sum: Compute a running total that respects a hierarchy (e.g., sum sales by region → country → city)

SELECT 
  region,
  country,
  city,
  sales_date,
  sales_amount,
  SUM(sales_amount) OVER (PARTITION BY region, country, city ORDER BY sales_date) AS city_running_total,
  SUM(sales_amount) OVER (PARTITION BY region, country ORDER BY sales_date) AS country_running_total,
  SUM(sales_amount) OVER (PARTITION BY region ORDER BY sales_date) AS region_running_total
FROM sales_data
ORDER BY region, country, city, sales_date;

Q43. Conditional Running Sum: Calculate a running sum that only includes values meeting a condition (e.g., sum sales where amount > 100)

SELECT 
  date,
  amount,
  SUM(CASE WHEN amount > 100 THEN amount ELSE 0 END) 
    OVER (ORDER BY date) AS conditional_running_sum
FROM transactions;

Q44. Dynamic Window Sizes: Write a query where the window frame size varies by row (e.g., 7-day average for weekdays, 3-day for weekends)

SELECT
date, sales, CASE WHEN EXTRACT(DOW FROM date) BETWEEN 1 AND 5 THEN — Weekday
AVG(sales) OVER (ORDER BY date ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING)
ELSE — Weekend
AVG(sales) OVER (ORDER BY date ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING)
END AS dynamic_avg
FROM daily_sales;

If you’re looking to master window functions in Snowflake SQL, the Snowflake Window Functions Documentation provides an excellent reference guide

Leave a Comment