Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MySQL: sum datetimes without counting overlapping periods twice

Tags:

mysql

I need to count the total amount of time spent based on a table like:

id | start_time | end_time |

where periods can overlap. I need to count ovelpapping periods only once.

E.g. if I have periods like these:

*----A----*              *------C-----* *----------D----------*
                  *-----B-----*              *---E---*

the sum will be: (A.end-A.start) + (C.end - B.start) + (D.end - D.start)

I'm a bit confused with the approach I should use to write this query and will be grateful for help.

like image 609
svz Avatar asked Apr 03 '13 07:04

svz


Video Answer


3 Answers

Ok, I seriously insist you test this in all manners before using it in production. Especially test what happens if there are MULTIPLE overlaps for 1 time span.

What this query does is calculate the duration of each time span, and how much overlap exists with other time spans that have a higher id.

select
    t1.id,
    t1.start_time,
    t1.end_time,
    t1.end_time - t1.start_time as duration,
    sum(
          if(t2.start_time <  t1.start_time and t2.end_time >  t1.end_time  , t1.end_time - t1.start_time, 0) -- t2 completely around t1
        + if(t2.start_time >= t1.start_time and t2.end_time <= t1.end_time  , t2.end_time - t2.start_time, 0) -- t2 completely within t1
        + if(t2.start_time <  t1.start_time and t2.end_time >  t1.start_time and t2.end_time   < t1.end_time  , t2.end_time - t1.start_time, 0) -- t2 starts before t1 starts and overlaps partially
        + if(t2.start_time <  t1.end_time   and t2.end_time >  t1.end_time   and t2.start_time > t1.start_time, t1.end_time - t2.start_time, 0) -- t2 starts before t1 ends and overlaps partially
    ) as overlap
from
    times t1
    left join times t2 on
        t2.id > t1.id --  t2.id is greater than t1.id
        and (
               (t2.start_time <  t1.start_time and t2.end_time >  t1.end_time  ) -- t2 completely around t1
            or (t2.start_time >= t1.start_time and t2.end_time <= t1.end_time  ) -- t2 completely within t1
            or (t2.start_time <  t1.start_time and t2.end_time >  t1.start_time) -- t2 starts before t1 starts and overlaps
            or (t2.start_time <  t1.end_time   and t2.end_time >  t1.end_time  ) -- t2 starts before t1 ends and overlaps
        )
group by
    t1.id

So what you want to have in the end is this:

select
    sum(t.duration) - sum(t.overlap) as filtered_duration
from
    (
        OTHER QUERY HERE
    ) as t

So in the end you have this query:

select
    sum(t.duration) - sum(t.overlap) as filtered_duration
from
    (
        select
            t1.id,
            t1.start_time,
            t1.end_time,
            t1.end_time - t1.start_time as duration,
            sum(
                  if(t2.start_time <  t1.start_time and t2.end_time >  t1.end_time  , t1.end_time - t1.start_time, 0) -- t2 completely around t1
                + if(t2.start_time >= t1.start_time and t2.end_time <= t1.end_time  , t2.end_time - t2.start_time, 0) -- t2 completely within t1
                + if(t2.start_time <  t1.start_time and t2.end_time >  t1.start_time and t2.end_time   < t1.end_time  , t2.end_time - t1.start_time, 0) -- t2 starts before t1 starts and overlaps partially
                + if(t2.start_time <  t1.end_time   and t2.end_time >  t1.end_time   and t2.start_time > t1.start_time, t1.end_time - t2.start_time, 0) -- t2 starts before t1 ends and overlaps partially
            ) as overlap
        from
            times t1
            left join times t2 on
                t2.id > t1.id --  t2.id is greater than t1.id
                and (
                       (t2.start_time <  t1.start_time and t2.end_time >  t1.end_time  ) -- t2 completely around t1
                    or (t2.start_time >= t1.start_time and t2.end_time <= t1.end_time  ) -- t2 completely within t1
                    or (t2.start_time <  t1.start_time and t2.end_time >  t1.start_time) -- t2 starts before t1 starts and overlaps
                    or (t2.start_time <  t1.end_time   and t2.end_time >  t1.end_time  ) -- t2 starts before t1 ends and overlaps
                )
        group by
            t1.id
    ) as t
like image 101
nl-x Avatar answered Oct 18 '22 01:10

nl-x


I'd like to suggest an other way to get the time, while being sure, that the result is correct. But I don't know, how to get this done with MySQL in complete.

I'll reuse the above example with the following hours - and there might even be a 3rd-level entry "F":

1         3              7           12 13    (15 16)        20
|----A----|              |------C-----| |----------D----------|
                  |-----B-----|              |---E---|
                  5           9              14     17
                                                |F|
  1. Query a combined list of all timestamps ordered by time and add the type of each "action"

    SELECT 1 as onoff, start_time as time FROM table
    UNION
    SELECT -1 as onoff, end_time as time FROM table
    ORDER BY time
    
  2. Process the list by a loop (?) with a temporary counter, which increments by 1 on starts/logins and decrements by 1 on ends/logouts

    The counter should cause the script to add a new row to a temporary table with tmp.start=<time>, if it changes from 0 to 1 and update tmp.end=<time> on the previous row in the temp. table, if it changes from 1 to 0.

    The script would do that for the example above like follows:

    QUERY                       TMP TABLE
    onoff | time  | ctr         ID | start | end
    1     | 01:00 | 1           1  | 01:00 |            (record 1 added,   ctr 0->1)
    -1    | 03:00 | 0           1  | 01:00 | 03:00      (record 1 updated, ctr 1->0)
    1     | 05:00 | 1           2  | 05:00 |            (record 2 added,   ctr 0->1)
    1     | 07:00 | 2                                   (nothing to do)
    -1    | 09:00 | 1                                   (nothing to do)
    -1    | 12:00 | 0           2  | 05:00 | 12:00      (record 2 updated, ctr 1->0)
    1     | 13:00 | 1           3  | 13:00 |            (record 3 added,   ctr 0->1)
    1     | 14:00 | 2                                   (nothing to do)
    1     | 15:00 | 3                                   (nothing to do)
    -1    | 16:00 | 2                                   (nothing to do)
    -1    | 17:00 | 1                                   (nothing to do)
    -1    | 20:00 | 0           3  | 13:00 | 20:00      (record 3 updated, ctr 1->0)
    
  3. The last step is very easy: Get the timestampdiff() from start to end in the unit, you need/like it to be and do any further filtering or grouping.

    For example: To use the data elsewhere

    SELECT ID, start, end, timestampdiff(MINUTE, start, end) FROM tmp
    

    or for example: Sum up worktime / time logged in per user

    SELECT user_id, SUM(timestampdiff(MINUTE, start, end)) FROM tmp GROUP BY user_id
    

I'm sure, this will give the correct duration for any level of nesting, but does anyone know how to accomplish this in MySQL? I'd like to use this, too.

Best regards

PS: The script might also "close" the last session or throw an error, if it ends with counter > 1 and throw an error, if the counter becomes < 0 at any time

like image 24
user2347378 Avatar answered Oct 17 '22 23:10

user2347378


I wrote a similar query for another question so I thought I'd adjust it for this question incase anyone is interested.

SELECT SUM(a.end_time - a.start_time) total_duration
  FROM (
    SELECT MIN(g.start_time) start_time, MAX(g.end_time) end_time 
      FROM (
        SELECT @group_id := @group_id + (@end_time IS NULL OR o.start_time > @end_time) group_id,
               start_time,
               @end_time := CAST(CASE 
                 WHEN (@end_time IS NULL OR o.start_time > @end_time) THEN o.end_time
                 ELSE GREATEST(o.end_time, @end_time)
               END AS DATETIME) end_time  
          FROM times o
          JOIN (SELECT @group_id := 0, @end_time := NULL) init
      ORDER BY o.start_time ASC  
            ) g
  GROUP BY  g.group_id  
        ) a

The inner-most query groups together your times in overlapping groups stretching the end_time where appropriate. The end_time flexes to deal with times completely enclosed by the previous.

The next wrapping query extracts the full time range from each group.

The outer query sums up the diffs for each group.

like image 1
Arth Avatar answered Oct 18 '22 01:10

Arth