I'm writing an app that handles scheduling time off for some of our employees. As part of this, I need to calculate how many minutes throughout the day that they have requested off. In the first version of this tool, we disallowed overlapping time off requests, because we wanted to be able to just add up the total of <code>StartTime</code> minus <code>EndTime</code> for all requests. Preventing overlaps makes this calculation very fast. This has become problematic, because Managers now want to schedule team meetings but are unable to do so when someone has already asked for the day off. So, in the new version of the tool, we have a requirement to allow overlapping requests. Here is an example set of data like what we have: <pre class="prettyprint"><code>UserId | StartDate | EndDate ---------------------------- 1 | 2:00 | 4:00 1 | 3:00 | 5:00 1 | 3:45 | 9:00 2 | 6:00 | 9:00 2 | 7:00 | 8:00 3 | 2:00 | 3:00 3 | 4:00 | 5:00 4 | 1:00 | 7:00 </code></pre> The result that I need to get, as efficiently as possible, is this: <pre class="prettyprint"><code>UserId | StartDate | EndDate ---------------------------- 1 | 2:00 | 9:00 2 | 6:00 | 9:00 3 | 2:00 | 3:00 3 | 4:00 | 5:00 4 | 1:00 | 7:00 </code></pre> We can easily detect overlaps with this query: <pre class="prettyprint"><code>select * from requests r1 cross join requests r2 where r1.RequestId < r2.RequestId and r1.StartTime < r2.EndTime and r2.StartTime < r1.EndTime </code></pre> This is, in fact, how we were detecting and preventing the problems originally. Now, we are trying to merge the overlapping items, but I'm reaching the limits of my SQL ninja skills. It wouldn't be too hard to come up with a method using temp tables, but we want to avoid this if at all possible. Is there a set-based way to merge overlapping rows? <hr> <h3>Edit:</h3> It would also be acceptable for the all of the rows to show up, as long as they were collapsed into just their time. For example if someone wants off from three to five, and from four to six, it would be acceptable for them to have two rows, one from three to five, and the next from five to six OR one from three to four, and the next from four to six. Also, here is a little test bench: <pre class="prettyprint"><code>DECLARE @requests TABLE ( UserId int, StartDate time, EndDate time ) INSERT INTO @requests (UserId, StartDate, EndDate) VALUES (1, '2:00', '4:00'), (1, '3:00', '5:00'), (1, '3:45', '9:00'), (2, '6:00', '9:00'), (2, '7:00', '8:00'), (3, '2:00', '3:00'), (3, '4:00', '5:00'), (4, '1:00', '7:00'); </code></pre>

<h3>Complete Rewrite:</h3> <pre class="prettyprint"><code>;WITH new_grp AS ( SELECT r1.UserId, r1.StartTime FROM @requests r1 WHERE NOT EXISTS ( SELECT * FROM @requests r2 WHERE r1.UserId = r2.UserId AND r2.StartTime < r1.StartTime AND r2.EndTime >= r1.StartTime) GROUP BY r1.UserId, r1.StartTime -- there can be > 1 ),r AS ( SELECT r.RequestId, r.UserId, r.StartTime, r.EndTime ,count(*) AS grp -- guaranteed to be 1+ FROM @requests r JOIN new_grp n ON n.UserId = r.UserId AND n.StartTime <= r.StartTime GROUP BY r.RequestId, r.UserId, r.StartTime, r.EndTime ) SELECT min(RequestId) AS RequestId ,UserId ,min(StartTime) AS StartTime ,max(EndTime) AS EndTime FROM r GROUP BY UserId, grp ORDER BY UserId, grp </code></pre> Now produces the requested result and really covers all possible cases, including disjunct sub-groups and duplicates. Have a look at the comments to the test data in the working demo at data.SE. <ul> <li> CTE 1 Find the (unique!) points in time where a new group of overlapping intervals starts. </li> <li> CTE 2 Count the starts of new group up to (and including) every individual interval, thereby forming a unique group number per user. </li> <li> Final SELECT Merge the groups, take earlies start and latest end for groups. </li> </ul> I faced some difficulty, because T-SQL window functions <code>max()</code> or <code>sum()</code> do not accept an <code>ORDER BY</code> clause in a in a window. They can only compute one value per partition, which makes it impossible to compute a running sum / count per partition. Would work in PostgreSQL or Oracle (but not in MySQL, of course - it has neither window functions nor CTEs). The final solution uses one extra CTE and should be just as fast.

Can I use a SQL Server CTE to merge intersecting dates?

Tags:

sql

sql-server

sql-server-2008

common-table-expression

I'm writing an app that handles scheduling time off for some of our employees. As part of this, I need to calculate how many minutes throughout the day that they have requested off.

In the first version of this tool, we disallowed overlapping time off requests, because we wanted to be able to just add up the total of StartTime minus EndTime for all requests. Preventing overlaps makes this calculation very fast.

This has become problematic, because Managers now want to schedule team meetings but are unable to do so when someone has already asked for the day off.

So, in the new version of the tool, we have a requirement to allow overlapping requests.

Here is an example set of data like what we have:

UserId | StartDate | EndDate
----------------------------
 1     | 2:00      | 4:00
 1     | 3:00      | 5:00
 1     | 3:45      | 9:00
 2     | 6:00      | 9:00
 2     | 7:00      | 8:00
 3     | 2:00      | 3:00
 3     | 4:00      | 5:00
 4     | 1:00      | 7:00

The result that I need to get, as efficiently as possible, is this:

UserId | StartDate | EndDate
----------------------------
 1     | 2:00      | 9:00
 2     | 6:00      | 9:00
 3     | 2:00      | 3:00
 3     | 4:00      | 5:00
 4     | 1:00      | 7:00

We can easily detect overlaps with this query:

select
    *
from
    requests r1
cross join
    requests r2
where
    r1.RequestId < r2.RequestId
  and
    r1.StartTime < r2.EndTime
  and
    r2.StartTime < r1.EndTime

This is, in fact, how we were detecting and preventing the problems originally.

Now, we are trying to merge the overlapping items, but I'm reaching the limits of my SQL ninja skills.

It wouldn't be too hard to come up with a method using temp tables, but we want to avoid this if at all possible.

Is there a set-based way to merge overlapping rows?

Edit:

It would also be acceptable for the all of the rows to show up, as long as they were collapsed into just their time. For example if someone wants off from three to five, and from four to six, it would be acceptable for them to have two rows, one from three to five, and the next from five to six OR one from three to four, and the next from four to six.

Also, here is a little test bench:

DECLARE @requests TABLE
(
    UserId int,
    StartDate time,
    EndDate time
)

INSERT INTO @requests (UserId, StartDate, EndDate) VALUES
(1, '2:00', '4:00'),
(1, '3:00', '5:00'),
(1, '3:45', '9:00'),
(2, '6:00', '9:00'),
(2, '7:00', '8:00'),
(3, '2:00', '3:00'),
(3, '4:00', '5:00'),
(4, '1:00', '7:00');

866

asked Dec 03 '11 01:12

John Gietzen

1 Answers

Complete Rewrite:

;WITH new_grp AS (
   SELECT r1.UserId, r1.StartTime
   FROM   @requests r1
   WHERE  NOT EXISTS (
          SELECT *
          FROM   @requests r2
          WHERE  r1.UserId = r2.UserId
          AND    r2.StartTime <  r1.StartTime
          AND    r2.EndTime   >= r1.StartTime)
   GROUP  BY r1.UserId, r1.StartTime -- there can be > 1
   ),r AS (
   SELECT r.RequestId, r.UserId, r.StartTime, r.EndTime
         ,count(*) AS grp -- guaranteed to be 1+
   FROM   @requests r
   JOIN   new_grp n ON n.UserId = r.UserId AND n.StartTime <= r.StartTime
   GROUP  BY r.RequestId, r.UserId, r.StartTime, r.EndTime
   )
SELECT min(RequestId) AS RequestId
      ,UserId
      ,min(StartTime) AS StartTime
      ,max(EndTime)   AS EndTime
FROM   r
GROUP  BY UserId, grp
ORDER  BY UserId, grp

Now produces the requested result and really covers all possible cases, including disjunct sub-groups and duplicates. Have a look at the comments to the test data in the working demo at data.SE.

CTE 1
Find the (unique!) points in time where a new group of overlapping intervals starts.
CTE 2
Count the starts of new group up to (and including) every individual interval, thereby forming a unique group number per user.
Final SELECT
Merge the groups, take earlies start and latest end for groups.

I faced some difficulty, because T-SQL window functions max() or sum() do not accept an ORDER BY clause in a in a window. They can only compute one value per partition, which makes it impossible to compute a running sum / count per partition. Would work in PostgreSQL or Oracle (but not in MySQL, of course - it has neither window functions nor CTEs).

The final solution uses one extra CTE and should be just as fast.

answered Sep 30 '22 04:09

Erwin Brandstetter

Related questions
                            
                                What's the best way to select data only appearing in one of two tables?
                            
                                HQL / JPQL - Nested select on FROM
                            
                                LEFT OUTER JOIN with conditions (where, order by)?
                            
                                Finding a DLL SQL Assembly with only the Database in SQL Server
                            
                                Does MS Access suppress primary key violations on Inserts?
                            
                                SQL: Normalization of database while retaining constraints
                            
                                Converting a nested sql where-in pattern to joins
                            
                                Trigger UPDATE() and COLUMNS_UPDATED() functions
                            
                                Combine rows when the end time of one is the start time of another (Oracle)
                            
                                pyodbc connection error when trying to connect to DB on localhost
                            
                                how to remove default constraint from column in Db2
                            
                                Compute percent in SQLite
                            
                                How to change a value name during output
                            
                                ADO.NET timeout but works fine in SSMS
                            
                                Why does adding '*' to a MySQL query cause a syntax error?
                            
                                Sql Server 2008 Cross Tab Query
                            
                                What would be an optimal SQL query to retrieve the following result set
                            
                                How to fill a query sql with multiple optional parameter in PreparedStatement?
                            
                                can't insert data into table
                            
                                Aliasing derived table which is a union of two selects

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With