I need to count the total amount of time spent based on a table like:
id | start_time | end_time |
where periods can overlap. I need to count ovelpapping periods only once.
E.g. if I have periods like these:
*----A----* *------C-----* *----------D----------* *-----B-----* *---E---*
the sum will be: (A.end-A.start) + (C.end - B.start) + (D.end - D.start)
I'm a bit confused with the approach I should use to write this query and will be grateful for help.
Ok, I seriously insist you test this in all manners before using it in production. Especially test what happens if there are MULTIPLE overlaps for 1 time span.
What this query does is calculate the duration of each time span, and how much overlap exists with other time spans that have a higher id.
select
t1.id,
t1.start_time,
t1.end_time,
t1.end_time - t1.start_time as duration,
sum(
if(t2.start_time < t1.start_time and t2.end_time > t1.end_time , t1.end_time - t1.start_time, 0) -- t2 completely around t1
+ if(t2.start_time >= t1.start_time and t2.end_time <= t1.end_time , t2.end_time - t2.start_time, 0) -- t2 completely within t1
+ if(t2.start_time < t1.start_time and t2.end_time > t1.start_time and t2.end_time < t1.end_time , t2.end_time - t1.start_time, 0) -- t2 starts before t1 starts and overlaps partially
+ if(t2.start_time < t1.end_time and t2.end_time > t1.end_time and t2.start_time > t1.start_time, t1.end_time - t2.start_time, 0) -- t2 starts before t1 ends and overlaps partially
) as overlap
from
times t1
left join times t2 on
t2.id > t1.id -- t2.id is greater than t1.id
and (
(t2.start_time < t1.start_time and t2.end_time > t1.end_time ) -- t2 completely around t1
or (t2.start_time >= t1.start_time and t2.end_time <= t1.end_time ) -- t2 completely within t1
or (t2.start_time < t1.start_time and t2.end_time > t1.start_time) -- t2 starts before t1 starts and overlaps
or (t2.start_time < t1.end_time and t2.end_time > t1.end_time ) -- t2 starts before t1 ends and overlaps
)
group by
t1.id
So what you want to have in the end is this:
select
sum(t.duration) - sum(t.overlap) as filtered_duration
from
(
OTHER QUERY HERE
) as t
So in the end you have this query:
select
sum(t.duration) - sum(t.overlap) as filtered_duration
from
(
select
t1.id,
t1.start_time,
t1.end_time,
t1.end_time - t1.start_time as duration,
sum(
if(t2.start_time < t1.start_time and t2.end_time > t1.end_time , t1.end_time - t1.start_time, 0) -- t2 completely around t1
+ if(t2.start_time >= t1.start_time and t2.end_time <= t1.end_time , t2.end_time - t2.start_time, 0) -- t2 completely within t1
+ if(t2.start_time < t1.start_time and t2.end_time > t1.start_time and t2.end_time < t1.end_time , t2.end_time - t1.start_time, 0) -- t2 starts before t1 starts and overlaps partially
+ if(t2.start_time < t1.end_time and t2.end_time > t1.end_time and t2.start_time > t1.start_time, t1.end_time - t2.start_time, 0) -- t2 starts before t1 ends and overlaps partially
) as overlap
from
times t1
left join times t2 on
t2.id > t1.id -- t2.id is greater than t1.id
and (
(t2.start_time < t1.start_time and t2.end_time > t1.end_time ) -- t2 completely around t1
or (t2.start_time >= t1.start_time and t2.end_time <= t1.end_time ) -- t2 completely within t1
or (t2.start_time < t1.start_time and t2.end_time > t1.start_time) -- t2 starts before t1 starts and overlaps
or (t2.start_time < t1.end_time and t2.end_time > t1.end_time ) -- t2 starts before t1 ends and overlaps
)
group by
t1.id
) as t
I'd like to suggest an other way to get the time, while being sure, that the result is correct. But I don't know, how to get this done with MySQL in complete.
I'll reuse the above example with the following hours - and there might even be a 3rd-level entry "F":
1 3 7 12 13 (15 16) 20
|----A----| |------C-----| |----------D----------|
|-----B-----| |---E---|
5 9 14 17
|F|
Query a combined list of all timestamps ordered by time and add the type of each "action"
SELECT 1 as onoff, start_time as time FROM table
UNION
SELECT -1 as onoff, end_time as time FROM table
ORDER BY time
Process the list by a loop (?) with a temporary counter, which increments by 1 on starts/logins and decrements by 1 on ends/logouts
The counter should cause the script to add a new row to a temporary table with tmp.start=<time>
, if it changes from 0 to 1
and update tmp.end=<time>
on the previous row in the temp. table, if it changes from 1 to 0.
The script would do that for the example above like follows:
QUERY TMP TABLE
onoff | time | ctr ID | start | end
1 | 01:00 | 1 1 | 01:00 | (record 1 added, ctr 0->1)
-1 | 03:00 | 0 1 | 01:00 | 03:00 (record 1 updated, ctr 1->0)
1 | 05:00 | 1 2 | 05:00 | (record 2 added, ctr 0->1)
1 | 07:00 | 2 (nothing to do)
-1 | 09:00 | 1 (nothing to do)
-1 | 12:00 | 0 2 | 05:00 | 12:00 (record 2 updated, ctr 1->0)
1 | 13:00 | 1 3 | 13:00 | (record 3 added, ctr 0->1)
1 | 14:00 | 2 (nothing to do)
1 | 15:00 | 3 (nothing to do)
-1 | 16:00 | 2 (nothing to do)
-1 | 17:00 | 1 (nothing to do)
-1 | 20:00 | 0 3 | 13:00 | 20:00 (record 3 updated, ctr 1->0)
The last step is very easy: Get the timestampdiff()
from start
to end
in the unit, you need/like it to be and do any further filtering or grouping.
For example: To use the data elsewhere
SELECT ID, start, end, timestampdiff(MINUTE, start, end) FROM tmp
or for example: Sum up worktime / time logged in per user
SELECT user_id, SUM(timestampdiff(MINUTE, start, end)) FROM tmp GROUP BY user_id
I'm sure, this will give the correct duration for any level of nesting, but does anyone know how to accomplish this in MySQL? I'd like to use this, too.
Best regards
PS: The script might also "close" the last session or throw an error, if it ends with counter > 1 and throw an error, if the counter becomes < 0 at any time
I wrote a similar query for another question so I thought I'd adjust it for this question incase anyone is interested.
SELECT SUM(a.end_time - a.start_time) total_duration
FROM (
SELECT MIN(g.start_time) start_time, MAX(g.end_time) end_time
FROM (
SELECT @group_id := @group_id + (@end_time IS NULL OR o.start_time > @end_time) group_id,
start_time,
@end_time := CAST(CASE
WHEN (@end_time IS NULL OR o.start_time > @end_time) THEN o.end_time
ELSE GREATEST(o.end_time, @end_time)
END AS DATETIME) end_time
FROM times o
JOIN (SELECT @group_id := 0, @end_time := NULL) init
ORDER BY o.start_time ASC
) g
GROUP BY g.group_id
) a
The inner-most query groups together your times in overlapping groups stretching the end_time where appropriate. The end_time flexes to deal with times completely enclosed by the previous.
The next wrapping query extracts the full time range from each group.
The outer query sums up the diffs for each group.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With