I'm writing an app that handles scheduling time off for some of our employees. As part of this, I need to calculate how many minutes throughout the day that they have requested off.
In the first version of this tool, we disallowed overlapping time off requests, because we wanted to be able to just add up the total of StartTime
minus EndTime
for all requests. Preventing overlaps makes this calculation very fast.
This has become problematic, because Managers now want to schedule team meetings but are unable to do so when someone has already asked for the day off.
So, in the new version of the tool, we have a requirement to allow overlapping requests.
Here is an example set of data like what we have:
UserId | StartDate | EndDate
----------------------------
1 | 2:00 | 4:00
1 | 3:00 | 5:00
1 | 3:45 | 9:00
2 | 6:00 | 9:00
2 | 7:00 | 8:00
3 | 2:00 | 3:00
3 | 4:00 | 5:00
4 | 1:00 | 7:00
The result that I need to get, as efficiently as possible, is this:
UserId | StartDate | EndDate
----------------------------
1 | 2:00 | 9:00
2 | 6:00 | 9:00
3 | 2:00 | 3:00
3 | 4:00 | 5:00
4 | 1:00 | 7:00
We can easily detect overlaps with this query:
select
*
from
requests r1
cross join
requests r2
where
r1.RequestId < r2.RequestId
and
r1.StartTime < r2.EndTime
and
r2.StartTime < r1.EndTime
This is, in fact, how we were detecting and preventing the problems originally.
Now, we are trying to merge the overlapping items, but I'm reaching the limits of my SQL ninja skills.
It wouldn't be too hard to come up with a method using temp tables, but we want to avoid this if at all possible.
Is there a set-based way to merge overlapping rows?
It would also be acceptable for the all of the rows to show up, as long as they were collapsed into just their time. For example if someone wants off from three to five, and from four to six, it would be acceptable for them to have two rows, one from three to five, and the next from five to six OR one from three to four, and the next from four to six.
Also, here is a little test bench:
DECLARE @requests TABLE
(
UserId int,
StartDate time,
EndDate time
)
INSERT INTO @requests (UserId, StartDate, EndDate) VALUES
(1, '2:00', '4:00'),
(1, '3:00', '5:00'),
(1, '3:45', '9:00'),
(2, '6:00', '9:00'),
(2, '7:00', '8:00'),
(3, '2:00', '3:00'),
(3, '4:00', '5:00'),
(4, '1:00', '7:00');
Disadvantages of CTECTE's members cannot use the following clauses of keywords Distinct, Group By, Having, Top, Joins limiting by this type of the queries that can be created and reducing their complexity. The Recursive member can refer to the CTE only once.
Multiple CTE query definitions can be defined in a CTE. A CTE must be followed by a single SELECT statement. INSERT , UPDATE , DELETE , and MERGE statements aren't supported.
Advantages of CTE CTE improves the code readability. CTE provides recursive programming. CTE makes code maintainability easier. Though it provides similar functionality as a view, it will not store the definition in metadata.
No, you can't use CTE outside the CTE. The only way is to use temporary table ;) ;WITH CTE AS ( --SELECT ... )
;WITH new_grp AS (
SELECT r1.UserId, r1.StartTime
FROM @requests r1
WHERE NOT EXISTS (
SELECT *
FROM @requests r2
WHERE r1.UserId = r2.UserId
AND r2.StartTime < r1.StartTime
AND r2.EndTime >= r1.StartTime)
GROUP BY r1.UserId, r1.StartTime -- there can be > 1
),r AS (
SELECT r.RequestId, r.UserId, r.StartTime, r.EndTime
,count(*) AS grp -- guaranteed to be 1+
FROM @requests r
JOIN new_grp n ON n.UserId = r.UserId AND n.StartTime <= r.StartTime
GROUP BY r.RequestId, r.UserId, r.StartTime, r.EndTime
)
SELECT min(RequestId) AS RequestId
,UserId
,min(StartTime) AS StartTime
,max(EndTime) AS EndTime
FROM r
GROUP BY UserId, grp
ORDER BY UserId, grp
Now produces the requested result and really covers all possible cases, including disjunct sub-groups and duplicates. Have a look at the comments to the test data in the working demo at data.SE.
CTE 1
Find the (unique!) points in time where a new group of overlapping intervals starts.
CTE 2
Count the starts of new group up to (and including) every individual interval, thereby forming a unique group number per user.
Final SELECT
Merge the groups, take earlies start and latest end for groups.
I faced some difficulty, because T-SQL window functions max()
or sum()
do not accept an ORDER BY
clause in a in a window. They can only compute one value per partition, which makes it impossible to compute a running sum / count per partition. Would work in PostgreSQL or Oracle (but not in MySQL, of course - it has neither window functions nor CTEs).
The final solution uses one extra CTE and should be just as fast.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With