Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can I use a SQL Server CTE to merge intersecting dates?

I'm writing an app that handles scheduling time off for some of our employees. As part of this, I need to calculate how many minutes throughout the day that they have requested off.

In the first version of this tool, we disallowed overlapping time off requests, because we wanted to be able to just add up the total of StartTime minus EndTime for all requests. Preventing overlaps makes this calculation very fast.

This has become problematic, because Managers now want to schedule team meetings but are unable to do so when someone has already asked for the day off.

So, in the new version of the tool, we have a requirement to allow overlapping requests.

Here is an example set of data like what we have:

UserId | StartDate | EndDate
----------------------------
 1     | 2:00      | 4:00
 1     | 3:00      | 5:00
 1     | 3:45      | 9:00
 2     | 6:00      | 9:00
 2     | 7:00      | 8:00
 3     | 2:00      | 3:00
 3     | 4:00      | 5:00
 4     | 1:00      | 7:00

The result that I need to get, as efficiently as possible, is this:

UserId | StartDate | EndDate
----------------------------
 1     | 2:00      | 9:00
 2     | 6:00      | 9:00
 3     | 2:00      | 3:00
 3     | 4:00      | 5:00
 4     | 1:00      | 7:00

We can easily detect overlaps with this query:

select
    *
from
    requests r1
cross join
    requests r2
where
    r1.RequestId < r2.RequestId
  and
    r1.StartTime < r2.EndTime
  and
    r2.StartTime < r1.EndTime

This is, in fact, how we were detecting and preventing the problems originally.

Now, we are trying to merge the overlapping items, but I'm reaching the limits of my SQL ninja skills.

It wouldn't be too hard to come up with a method using temp tables, but we want to avoid this if at all possible.

Is there a set-based way to merge overlapping rows?


Edit:

It would also be acceptable for the all of the rows to show up, as long as they were collapsed into just their time. For example if someone wants off from three to five, and from four to six, it would be acceptable for them to have two rows, one from three to five, and the next from five to six OR one from three to four, and the next from four to six.

Also, here is a little test bench:

DECLARE @requests TABLE
(
    UserId int,
    StartDate time,
    EndDate time
)

INSERT INTO @requests (UserId, StartDate, EndDate) VALUES
(1, '2:00', '4:00'),
(1, '3:00', '5:00'),
(1, '3:45', '9:00'),
(2, '6:00', '9:00'),
(2, '7:00', '8:00'),
(3, '2:00', '3:00'),
(3, '4:00', '5:00'),
(4, '1:00', '7:00');
like image 866
John Gietzen Avatar asked Dec 03 '11 01:12

John Gietzen


People also ask

What is the limitation of CTE in SQL Server?

Disadvantages of CTECTE's members cannot use the following clauses of keywords Distinct, Group By, Having, Top, Joins limiting by this type of the queries that can be created and reducing their complexity. The Recursive member can refer to the CTE only once.

Can we use CTE in Merge statement?

Multiple CTE query definitions can be defined in a CTE. A CTE must be followed by a single SELECT statement. INSERT , UPDATE , DELETE , and MERGE statements aren't supported.

What are the advantages of using CTE in SQL Server?

Advantages of CTE CTE improves the code readability. CTE provides recursive programming. CTE makes code maintainability easier. Though it provides similar functionality as a view, it will not store the definition in metadata.

Can we use CTE twice?

No, you can't use CTE outside the CTE. The only way is to use temporary table ;) ;WITH CTE AS ( --SELECT ... )


1 Answers

Complete Rewrite:

;WITH new_grp AS (
   SELECT r1.UserId, r1.StartTime
   FROM   @requests r1
   WHERE  NOT EXISTS (
          SELECT *
          FROM   @requests r2
          WHERE  r1.UserId = r2.UserId
          AND    r2.StartTime <  r1.StartTime
          AND    r2.EndTime   >= r1.StartTime)
   GROUP  BY r1.UserId, r1.StartTime -- there can be > 1
   ),r AS (
   SELECT r.RequestId, r.UserId, r.StartTime, r.EndTime
         ,count(*) AS grp -- guaranteed to be 1+
   FROM   @requests r
   JOIN   new_grp n ON n.UserId = r.UserId AND n.StartTime <= r.StartTime
   GROUP  BY r.RequestId, r.UserId, r.StartTime, r.EndTime
   )
SELECT min(RequestId) AS RequestId
      ,UserId
      ,min(StartTime) AS StartTime
      ,max(EndTime)   AS EndTime
FROM   r
GROUP  BY UserId, grp
ORDER  BY UserId, grp

Now produces the requested result and really covers all possible cases, including disjunct sub-groups and duplicates. Have a look at the comments to the test data in the working demo at data.SE.

  • CTE 1
    Find the (unique!) points in time where a new group of overlapping intervals starts.

  • CTE 2
    Count the starts of new group up to (and including) every individual interval, thereby forming a unique group number per user.

  • Final SELECT
    Merge the groups, take earlies start and latest end for groups.

I faced some difficulty, because T-SQL window functions max() or sum() do not accept an ORDER BY clause in a in a window. They can only compute one value per partition, which makes it impossible to compute a running sum / count per partition. Would work in PostgreSQL or Oracle (but not in MySQL, of course - it has neither window functions nor CTEs).

The final solution uses one extra CTE and should be just as fast.

like image 73
Erwin Brandstetter Avatar answered Sep 30 '22 04:09

Erwin Brandstetter