Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Grouping rows considering "difference" between rows

I've got a table which contains start-times (using number in example to keep it simple), and durations of events.

I would like to identify "blocks" and their start- and the end-time.
Whenever the difference between the end-time (start-time + duration) of the previous row (sorted by start-time) and the start-time of the current row is >=5, a new "block" should begin.

This is my test-data, including an attempt of a graphical explanation in the comments:

WITH test_data AS (
  SELECT  0 s, 2 dur FROM dual UNION ALL   --# ■■
  SELECT  2  , 2     FROM dual UNION ALL   --#   ■■
  SELECT 10  , 1     FROM dual UNION ALL   --#           ■
  SELECT 13  , 4     FROM dual UNION ALL   --#              ■■■■
  SELECT 15  , 4     FROM dual             --#                ■■■■
)
--# Should return
--#   0 ..  4                              --# ■■■■
--#  10 .. 19                              --#           ■■■■■■■■■

The first block starts at 0 and ends at 4. Since the difference to the next row is >=5, start another block at 10 which ends at 19.


I can identify the first row of a block, using LAG, but I have not yet found out how to proceed.

And I could solve the problem in a PL/SQL-loop, but I'm trying to avoid that for performance reasons.


Any suggestions on how to write this query?

Thanks in advance, Peter

like image 919
Peter Lang Avatar asked Nov 25 '10 13:11

Peter Lang


2 Answers

I use subqueries with analytics to identify and group contiguous ranges:

SQL> WITH test_data AS (
  2    SELECT  0 s, 2 dur FROM dual UNION ALL   --# ■■
  3    SELECT  2  , 2     FROM dual UNION ALL   --#   ■■
  4    SELECT 10  , 1     FROM dual UNION ALL   --#           ■
  5    SELECT 13  , 4     FROM dual UNION ALL   --#              ■■■■
  6    SELECT 15  , 4     FROM dual             --#                ■■■■
  7  )
  8  SELECT MIN(s) "begin", MAX(s + dur) "end"
  9    FROM (SELECT s, dur, SUM(gap) over(ORDER BY s) my_group
 10             FROM (SELECT s, dur,
 11                           CASE
 12                              WHEN lag(s + dur) over(ORDER BY s) >= s - 5 THEN
 13                               0
 14                              ELSE
 15                               1
 16                           END gap
 17                      FROM test_data
 18                     ORDER BY s))
 19   GROUP BY my_group;

     begin        end
---------- ----------
         0          4
        10         19
like image 108
Vincent Malgrat Avatar answered Sep 18 '22 23:09

Vincent Malgrat


The code gets a bit complicated with a number of subqueries, etc. The may be instances of data where this doesn't work but I can't think of any off the top of my head.

Working with temporal data is always a pain!

WITH test_data AS (
  SELECT  0 s, 2 dur FROM dual UNION ALL   --# ■■
  SELECT  2  , 2     FROM dual UNION ALL   --#   ■■
  SELECT 10  , 1     FROM dual UNION ALL   --#           ■
  SELECT 13  , 4     FROM dual UNION ALL   --#              ■■■■
  SELECT 15  , 4     FROM dual             --#                ■■■■
)
select 
-- Group on each block
  min(start_time) as s, 
  max(end_time) - min(start_time) as dur
from (
  select 
    start_time,
    duration, 
    end_time, 
-- number the blocks sequentially 
    sum(is_block_start) over (order by start_time) as block_num
  from (
    select 
      start_time, 
      duration, 
      end_time, 
-- Mark the start of each block
      case 
        when nvl2(prev_end_time, start_time - prev_end_time,5) >= 5 
        then 1 else 0 end as is_block_start
    from (
      select 
        s as start_time, 
        dur as duration, 
        s+dur as end_time,
        lag(s+dur) over (order by s) prev_end_time
      from test_data
    )
  )
)
group by block_num
like image 43
Mike Meyers Avatar answered Sep 20 '22 23:09

Mike Meyers