COUNT() OVER conditioned on the CURRENT ROW

Question

Given each row that represents a task, with start time and end time, how can I calculate the number of running tasks (i.e. that started and not ended) at the time each task starts (including itself) using a window function with COUNT OVER? Is a window function even the right approach?

Example, given table tasks:

task_id  start_time  end_time
   a         1          10
   b         2           5
   c         5          15
   d         8          13
   e        12          20
   f        21          30

Calculate running_tasks:

task_id  start_time  end_time  running_tasks
   a         1          10           1         # a
   b         2           5           2         # a,b
   c         5          15           2         # a,c (b has ended)
   d         8          13           3         # a,c,d
   e        12          20           3         # c,d,e (a has ended)
   f        21          30           1         # f (c,d,e have ended)

David דודו Markovitz · Accepted Answer

select      task_id,start_time,end_time,running_tasks 

from       (select      task_id,tm,op,start_time,end_time

                       ,sum(op) over 
                        (
                            order by    tm,op 
                            rows        unbounded preceding
                        ) as running_tasks 

            from       (select      task_id,start_time as tm,1 as op,start_time,end_time 
                        from        tasks 

                        union   all 

                        select      task_id,end_time as tm,-1 as op,start_time,end_time 
                        from        tasks 
                        ) t 
            )t 

where       op = 1
;

Elliott Brossard · Answer

You can use a correlated subquery, which in this case is a self-join; no analytic functions are needed. After enabling standard SQL (uncheck "Use Legacy SQL" under "Show Options" in the UI) you can run this example:

WITH tasks AS (
  SELECT
    task_id,
    start_time,
    end_time
  FROM UNNEST(ARRAY<STRUCT<task_id STRING, start_time INT64, end_time INT64>>[
    ('a', 1, 10),
    ('b', 2, 5),
    ('c', 5, 15),
    ('d', 8, 13),
    ('e', 12, 20),
    ('f', 21, 30)
  ])
)
SELECT
  *,
  (SELECT COUNT(*) FROM tasks t2
   WHERE t.start_time >= t2.start_time AND
   t.start_time < t2.end_time) AS running_tasks
FROM tasks t
ORDER BY task_id;

COUNT() OVER conditioned on the CURRENT ROW

Tags:

sql

window-functions

google-bigquery

New Dev

2 Answers

David דודו Markovitz

Elliott Brossard

Recent Activity

Donate For Us

COUNT() OVER conditioned on the CURRENT ROW

Tags:

sql

window-functions

google-bigquery

New Dev

2 Answers

David דודו Markovitz

Elliott Brossard

Related questions

Recent Activity

Donate For Us