Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Counting intersecting time intervals in T-SQL

CODE:

CREATE TABLE #Temp1 (CoachID INT, BusyST DATETIME, BusyET DATETIME)
CREATE TABLE #Temp2 (CoachID INT, AvailableST DATETIME, AvailableET DATETIME)

INSERT INTO #Temp1 (CoachID, BusyST, BusyET)
SELECT 1,'2016-08-17 09:12:00','2016-08-17 10:11:00'
UNION
SELECT 3,'2016-08-17 09:30:00','2016-08-17 10:00:00'
UNION
SELECT 4,'2016-08-17 12:07:00','2016-08-17 13:10:00'

INSERT INTO #Temp2 (CoachID, AvailableST, AvailableET)
SELECT 1,'2016-08-17 09:07:00','2016-08-17 11:09:00'
UNION
SELECT 2,'2016-08-17 09:11:00','2016-08-17 09:30:00'
UNION
SELECT 3,'2016-08-17 09:24:00','2016-08-17 13:08:00'
UNION
SELECT 1,'2016-08-17 11:34:00','2016-08-17 12:27:00'
UNION
SELECT 4,'2016-08-17 09:34:00','2016-08-17 13:00:00'
UNION
SELECT 5,'2016-08-17 09:10:00','2016-08-17 09:55:00'

--RESULT-SET QUERY GOES HERE

DROP TABLE #Temp1
DROP TABLE #Temp2

DESIRED OUTPUT:

CoachID CanCoachST                  CanCoachET                  NumOfCoaches
1       2016-08-17 09:12:00.000     2016-08-17 09:24:00.000     2 --(ID2 = 2,5)
1       2016-08-17 09:24:00.000     2016-08-17 09:30:00.000     3 --(ID2 = 2,3,5)
1       2016-08-17 09:30:00.000     2016-08-17 09:34:00.000     1 --(ID2 = 5)
1       2016-08-17 09:34:00.000     2016-08-17 09:55:00.000     2 --(ID2 = 4,5)
1       2016-08-17 09:55:00.000     2016-08-17 10:00:00.000     1 --(ID2 = 4)
1       2016-08-17 10:00:00.000     2016-08-17 10:11:00.000     2 --(ID2 = 3,4)
3       2016-08-17 09:30:00.000     2016-08-17 09:34:00.000     1 --(ID2 = 5)
3       2016-08-17 09:34:00.000     2016-08-17 09:55:00.000     2 --(ID2 = 4,5)
3       2016-08-17 09:55:00.000     2016-08-17 10:00:00.000     1 --(ID2 = 4)
4       2016-08-17 12:07:00.000     2016-08-17 12:27:00.000     2 --(ID2 = 1,3)
4       2016-08-17 12:27:00.000     2016-08-17 13:08:00.000     1 --(ID2 = 3)
4       2016-08-17 13:08:00.000     2016-08-17 13:10:00.000     0 --(No one is available)

GOAL: Consider #Temp1 as the table of team coaches (ID1) and their meeting times (ST1 = Meeting Start Time and ET1 = Meeting End Time). Consider #Temp2 as the table of team coaches (ID2) and their total available times (ST2 = Available Start Time and ET2 = Available End Time).

Now, the goal is to find all possible coaches from #Temp2 who are available to coach during the meeting time of the coaches from #Temp1.

So for example, For the coach ID1 = 1, who is busy between 9:12 and 10:11 (data can span across multiple days, if that info matters), we have coach ID2 = 2 and 5 that can coach between 9:12 and 9:24 , coach ID2 = 2,3, and 5 that can coach between 9:24 and 9:30 , coach ID2 = 5 that can coach between 9:30 and 9:34 , coach ID2 = 4 and 5 that can coach between 9:34 and 9:55 , coach ID2 = 4 that can coach between 9:55 and 10:00 , and coach ID2 = 3 and 4 that can coach between 10:00 and 10:11 (note how ID 3, although available in #Temp2 table between 9:24 and 13:08, it can't coach for ID1 = 1 between 9:24 and 10:00 because its also busy between 9:30 and 10:00.

My effort so far: Only dealing with breaking #Temp1's time bracket so far. Still need to figure out A) how to remove that non-busy time window from the output B) add a field/map it to right T1's CoachIDs.

;WITH ED
AS (SELECT BusyET, CoachID FROM #Temp1  
    UNION ALL   
    SELECT BusyST, CoachID FROM #Temp1
    )
,Brackets
AS (SELECT MIN(BusyST) AS BusyST
        ,(  SELECT MIN(BusyET)
            FROM ED e
            WHERE e.BusyET > MIN(BusyST)
            ) AS BusyET
    FROM #Temp1 T   
    UNION ALL   
    SELECT B.BusyET
        ,e.BusyET
    FROM Brackets B
    INNER JOIN ED E ON B.BusyET < E.BusyET
    WHERE NOT EXISTS (
            SELECT *
            FROM ED E2
            WHERE E2.BusyET > B.BusyET
                AND E2.BusyET < E.BusyET
            )
    )
SELECT *
FROM Brackets
ORDER BY BusyST;

I think I need to join on comparing ST/ET dates between two tables where IDs don't match each other. But I'm having trouble figuring out how to actually get only the meeting time window and unique count.

Updated with better schema/data-set. Also note, even though CoachID 4 is not "scheduled" to be available, he's still listed as busy for that last few minutes. And there can be scenario where no one else is available to work during that time, in which case, we can return 0 cnt record (or not return it if it's really complicated).

Again, the goal is to find count and combination of all available CoachIDs and their Available time window that can coach for the CoachIDs listed in the busy table.

Updated with more sample description matching sample data.

like image 963
007 Avatar asked Aug 18 '16 03:08

007


2 Answers

The query in this answer was inspired by the Packing Intervals by Itzik Ben-Gan.


At first I didn't understand the full complexity of the requirements and assumed that intervals in Table1 and Table2 don't overlap. I assumed that the same coach can't be busy and available at the same time.

It turns out that my assumption was wrong, so the first variant of the query that I'm leaving below has to be extended with preliminary step that subtracts all intervals stored in Table1 from intervals stored in Table2.

It uses the similar idea. Each start of the "available" interval is marked with +1 EventType and end of the "available" interval is marked with -1 EventType. For "busy" intervals the marks are reversed. "Busy" interval starts with -1 and ends with +1. This is done in C1_Subtract.

Then running total tells us where the "truly" available intervals are (C2_Subtract). Finally, CTE_Available leaves only "truly" available intervals.

Sample data

I added few rows to illustrate what happens if no coaches are available. I also added CoachID=9, which is not in the initial results of the first variant of the query.

CREATE TABLE #Temp1 (CoachID INT, BusyST DATETIME, BusyET DATETIME);
CREATE TABLE #Temp2 (CoachID INT, AvailableST DATETIME, AvailableET DATETIME);
-- Start time is inclusive
-- End time is exclusive

INSERT INTO #Temp1 (CoachID, BusyST, BusyET) VALUES
(1, '2016-08-17 09:12:00','2016-08-17 10:11:00'),
(3, '2016-08-17 09:30:00','2016-08-17 10:00:00'),
(4, '2016-08-17 12:07:00','2016-08-17 13:10:00'),

(6, '2016-08-17 15:00:00','2016-08-17 16:00:00'),
(9, '2016-08-17 15:00:00','2016-08-17 16:00:00');

INSERT INTO #Temp2 (CoachID, AvailableST, AvailableET) VALUES
(1,'2016-08-17 09:07:00','2016-08-17 11:09:00'),
(2,'2016-08-17 09:11:00','2016-08-17 09:30:00'),
(3,'2016-08-17 09:24:00','2016-08-17 13:08:00'),
(1,'2016-08-17 11:34:00','2016-08-17 12:27:00'),
(4,'2016-08-17 09:34:00','2016-08-17 13:00:00'),
(5,'2016-08-17 09:10:00','2016-08-17 09:55:00'),

(7,'2016-08-17 15:10:00','2016-08-17 15:20:00'),
(8,'2016-08-17 15:15:00','2016-08-17 15:25:00'),
(7,'2016-08-17 15:40:00','2016-08-17 15:55:00'),
(9,'2016-08-17 15:05:00','2016-08-17 15:07:00'),
(9,'2016-08-17 15:40:00','2016-08-17 16:55:00');

Intermediate results of CTE_Available

+---------+-------------------------+-------------------------+
| CoachID |       AvailableST       |       AvailableET       |
+---------+-------------------------+-------------------------+
|       1 | 2016-08-17 09:07:00.000 | 2016-08-17 09:12:00.000 |
|       1 | 2016-08-17 10:11:00.000 | 2016-08-17 11:09:00.000 |
|       1 | 2016-08-17 11:34:00.000 | 2016-08-17 12:27:00.000 |
|       2 | 2016-08-17 09:11:00.000 | 2016-08-17 09:30:00.000 |
|       3 | 2016-08-17 09:24:00.000 | 2016-08-17 09:30:00.000 |
|       3 | 2016-08-17 10:00:00.000 | 2016-08-17 13:08:00.000 |
|       4 | 2016-08-17 09:34:00.000 | 2016-08-17 12:07:00.000 |
|       5 | 2016-08-17 09:10:00.000 | 2016-08-17 09:55:00.000 |
|       7 | 2016-08-17 15:10:00.000 | 2016-08-17 15:20:00.000 |
|       7 | 2016-08-17 15:40:00.000 | 2016-08-17 15:55:00.000 |
|       8 | 2016-08-17 15:15:00.000 | 2016-08-17 15:25:00.000 |
|       9 | 2016-08-17 16:00:00.000 | 2016-08-17 16:55:00.000 |
+---------+-------------------------+-------------------------+

Now we can use these intermediate results of CTE_Available instead of #Temp2 in the first variant of the query. See detailed explanations below the first variant of the query.

Full query

WITH
C1_Subtract
AS
(
    SELECT
        CoachID
        ,AvailableST AS ts
        ,+1 AS EventType
    FROM #Temp2

    UNION ALL

    SELECT
        CoachID
        ,AvailableET AS ts
        ,-1 AS EventType
    FROM #Temp2

    UNION ALL

    SELECT
        CoachID
        ,BusyST AS ts
        ,-1 AS EventType
    FROM #Temp1

    UNION ALL

    SELECT
        CoachID
        ,BusyET AS ts
        ,+1 AS EventType
    FROM #Temp1
)
,C2_Subtract AS
(
    SELECT
        C1_Subtract.*
        ,SUM(EventType)
            OVER (
            PARTITION BY CoachID
            ORDER BY ts, EventType DESC
            ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
        AS cnt
        ,LEAD(ts) 
            OVER (
            PARTITION BY CoachID
            ORDER BY ts, EventType DESC)
        AS NextTS
    FROM C1_Subtract
)
,CTE_Available
AS
(
    SELECT
        C2_Subtract.CoachID
        ,C2_Subtract.ts AS AvailableST
        ,C2_Subtract.NextTS AS AvailableET
    FROM C2_Subtract
    WHERE cnt > 0
)
,CTE_Intervals
AS
(
    SELECT
        TBusy.CoachID AS BusyCoachID
        ,TBusy.BusyST
        ,TBusy.BusyET
        ,CA.CoachID AS AvailableCoachID
        ,CA.AvailableST
        ,CA.AvailableET
        -- max of start time
        ,CASE WHEN CA.AvailableST < TBusy.BusyST
        THEN TBusy.BusyST
        ELSE CA.AvailableST 
        END AS ST
        -- min of end time
        ,CASE WHEN CA.AvailableET > TBusy.BusyET
        THEN TBusy.BusyET
        ELSE CA.AvailableET
        END AS ET
    FROM
        #Temp1 AS TBusy
        CROSS APPLY
        (
            SELECT
                TAvailable.*
            FROM
                CTE_Available AS TAvailable
            WHERE
                -- the same coach can't be available and busy
                TAvailable.CoachID <> TBusy.CoachID
                -- intervals intersect
                AND TAvailable.AvailableST < TBusy.BusyET
                AND TAvailable.AvailableET > TBusy.BusyST
        ) AS CA
)
,C1 AS
(
    SELECT
        BusyCoachID
        ,AvailableCoachID
        ,ST AS ts
        ,+1 AS EventType
    FROM CTE_Intervals

    UNION ALL

    SELECT
        BusyCoachID
        ,AvailableCoachID
        ,ET AS ts
        ,-1 AS EventType
    FROM CTE_Intervals

    UNION ALL

    SELECT
        CoachID AS BusyCoachID
        ,CoachID AS AvailableCoachID
        ,BusyST AS ts
        ,+1 AS EventType
    FROM #Temp1

    UNION ALL

    SELECT
        CoachID AS BusyCoachID
        ,CoachID AS AvailableCoachID
        ,BusyET AS ts
        ,-1 AS EventType
    FROM #Temp1
)
,C2 AS
(
    SELECT
        C1.*
        ,SUM(EventType)
            OVER (
            PARTITION BY BusyCoachID
            ORDER BY ts, EventType DESC, AvailableCoachID
            ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
        - 1 AS cnt
        ,LEAD(ts) 
            OVER (
            PARTITION BY BusyCoachID 
            ORDER BY ts, EventType DESC, AvailableCoachID) 
        AS NextTS
    FROM C1
)
SELECT
    BusyCoachID AS CoachID
    ,ts AS CanCoachST
    ,NextTS AS CanCoachET
    ,cnt AS NumOfCoaches
FROM C2
WHERE ts <> NextTS
ORDER BY BusyCoachID, CanCoachST
;

Final result

+---------+-------------------------+-------------------------+--------------+
| CoachID |       CanCoachST        |       CanCoachET        | NumOfCoaches |
+---------+-------------------------+-------------------------+--------------+
|       1 | 2016-08-17 09:12:00.000 | 2016-08-17 09:24:00.000 |            2 |
|       1 | 2016-08-17 09:24:00.000 | 2016-08-17 09:30:00.000 |            3 |
|       1 | 2016-08-17 09:30:00.000 | 2016-08-17 09:34:00.000 |            1 |
|       1 | 2016-08-17 09:34:00.000 | 2016-08-17 09:55:00.000 |            2 |
|       1 | 2016-08-17 09:55:00.000 | 2016-08-17 10:00:00.000 |            1 |
|       1 | 2016-08-17 10:00:00.000 | 2016-08-17 10:11:00.000 |            2 |
|       3 | 2016-08-17 09:30:00.000 | 2016-08-17 09:34:00.000 |            1 |
|       3 | 2016-08-17 09:34:00.000 | 2016-08-17 09:55:00.000 |            2 |
|       3 | 2016-08-17 09:55:00.000 | 2016-08-17 10:00:00.000 |            1 |
|       4 | 2016-08-17 12:07:00.000 | 2016-08-17 12:27:00.000 |            2 |
|       4 | 2016-08-17 12:27:00.000 | 2016-08-17 13:08:00.000 |            1 |
|       4 | 2016-08-17 13:08:00.000 | 2016-08-17 13:10:00.000 |            0 |
|       6 | 2016-08-17 15:00:00.000 | 2016-08-17 15:10:00.000 |            0 |
|       6 | 2016-08-17 15:10:00.000 | 2016-08-17 15:15:00.000 |            1 |
|       6 | 2016-08-17 15:15:00.000 | 2016-08-17 15:20:00.000 |            2 |
|       6 | 2016-08-17 15:20:00.000 | 2016-08-17 15:25:00.000 |            1 |
|       6 | 2016-08-17 15:25:00.000 | 2016-08-17 15:40:00.000 |            0 |
|       6 | 2016-08-17 15:40:00.000 | 2016-08-17 15:55:00.000 |            1 |
|       6 | 2016-08-17 15:55:00.000 | 2016-08-17 16:00:00.000 |            0 |
|       9 | 2016-08-17 15:00:00.000 | 2016-08-17 15:10:00.000 |            0 |
|       9 | 2016-08-17 15:10:00.000 | 2016-08-17 15:15:00.000 |            1 |
|       9 | 2016-08-17 15:15:00.000 | 2016-08-17 15:20:00.000 |            2 |
|       9 | 2016-08-17 15:20:00.000 | 2016-08-17 15:25:00.000 |            1 |
|       9 | 2016-08-17 15:25:00.000 | 2016-08-17 15:40:00.000 |            0 |
|       9 | 2016-08-17 15:40:00.000 | 2016-08-17 15:55:00.000 |            1 |
|       9 | 2016-08-17 15:55:00.000 | 2016-08-17 16:00:00.000 |            0 |
+---------+-------------------------+-------------------------+--------------+

I'd recommend to create the following indexes to avoid some Sorts in the execution plan.

CREATE UNIQUE NONCLUSTERED INDEX [IX_CoachID_BusyST] ON #Temp1
(
    CoachID ASC,
    BusyST ASC
);

CREATE UNIQUE NONCLUSTERED INDEX [IX_CoachID_BusyET] ON #Temp1
(
    CoachID ASC,
    BusyET ASC
);

CREATE UNIQUE NONCLUSTERED INDEX [IX_CoachID_AvailableST] ON #Temp2
(
    CoachID ASC,
    AvailableST ASC
);

CREATE UNIQUE NONCLUSTERED INDEX [IX_CoachID_AvailableET] ON #Temp2
(
    CoachID ASC,
    AvailableET ASC
);

On real data, though, the bottleneck may be somewhere else, which may depend on the data distribution. The query is rather complicated and tuning it without real data would be too much guesswork.


First variant of the query

Run the query step-by-step, CTE-by-CTE and examine intermediate results to undestand how it works.

CTE_Intervals gives us a list of available intervals that intersect with busy intervals. C1 puts both start and end times in the same column with the corresponding EventType. This will help us track when an interval starts or ends. A running total of EventType gives the count of available coaches. C1 unions busy coaches into the mix to properly count cases when no coach is available.

WITH
CTE_Intervals
AS
(
    SELECT
        TBusy.CoachID AS BusyCoachID
        ,TBusy.BusyST
        ,TBusy.BusyET
        ,CA.CoachID AS AvailableCoachID
        ,CA.AvailableST
        ,CA.AvailableET
        -- max of start time
        ,CASE WHEN CA.AvailableST < TBusy.BusyST
        THEN TBusy.BusyST
        ELSE CA.AvailableST 
        END AS ST
        -- min of end time
        ,CASE WHEN CA.AvailableET > TBusy.BusyET
        THEN TBusy.BusyET
        ELSE CA.AvailableET
        END AS ET
    FROM
        #Temp1 AS TBusy
        CROSS APPLY
        (
            SELECT
                TAvailable.*
            FROM
                #Temp2 AS TAvailable
            WHERE
                -- the same coach can't be available and busy
                TAvailable.CoachID <> TBusy.CoachID
                -- intervals intersect
                AND TAvailable.AvailableST < TBusy.BusyET
                AND TAvailable.AvailableET > TBusy.BusyST
        ) AS CA
)
,C1 AS
(
    SELECT
        BusyCoachID
        ,AvailableCoachID
        ,ST AS ts
        ,+1 AS EventType
    FROM CTE_Intervals

    UNION ALL

    SELECT
        BusyCoachID
        ,AvailableCoachID
        ,ET AS ts
        ,-1 AS EventType
    FROM CTE_Intervals

    UNION ALL

    SELECT
        CoachID AS BusyCoachID
        ,CoachID AS AvailableCoachID
        ,BusyST AS ts
        ,+1 AS EventType
    FROM #Temp1

    UNION ALL

    SELECT
        CoachID AS BusyCoachID
        ,CoachID AS AvailableCoachID
        ,BusyET AS ts
        ,-1 AS EventType
    FROM #Temp1
)
,C2 AS
(
    SELECT
        C1.*
        ,SUM(EventType)
            OVER (
            PARTITION BY BusyCoachID
            ORDER BY ts, EventType DESC, AvailableCoachID
            ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
        - 1 AS cnt
        ,LEAD(ts) 
            OVER (
            PARTITION BY BusyCoachID 
            ORDER BY ts, EventType DESC, AvailableCoachID) 
        AS NextTS
    FROM C1
)
SELECT
    BusyCoachID AS CoachID
    ,ts AS CanCoachST
    ,NextTS AS CanCoachET
    ,cnt AS NumOfCoaches
FROM C2
WHERE ts <> NextTS
ORDER BY BusyCoachID, CanCoachST
;

DROP TABLE #Temp1;
DROP TABLE #Temp2;

Result

I've added comments for each line with IDs of available coaches that were counted.

Now I understand why my initial result was not the same as your expected result.

+---------+---------------------+---------------------+--------------+
| CoachID |       CanCoachST    |       CanCoachET    | NumOfCoaches |
+---------+---------------------+---------------------+--------------+
|       1 | 2016-08-17 09:12:00 | 2016-08-17 09:24:00 |            2 |  2,5
|       1 | 2016-08-17 09:24:00 | 2016-08-17 09:30:00 |            3 |  2,3,5
|       1 | 2016-08-17 09:30:00 | 2016-08-17 09:34:00 |            2 |  3,5
|       1 | 2016-08-17 09:34:00 | 2016-08-17 09:55:00 |            3 |  3,4,5
|       1 | 2016-08-17 09:55:00 | 2016-08-17 10:11:00 |            2 |  3,4
|       3 | 2016-08-17 09:30:00 | 2016-08-17 09:34:00 |            2 |  1,5
|       3 | 2016-08-17 09:34:00 | 2016-08-17 09:55:00 |            3 |  1,4,5
|       3 | 2016-08-17 09:55:00 | 2016-08-17 10:00:00 |            2 |  1,4
|       4 | 2016-08-17 12:07:00 | 2016-08-17 12:27:00 |            2 |  3,1
|       4 | 2016-08-17 12:27:00 | 2016-08-17 13:08:00 |            1 |  3
|       4 | 2016-08-17 13:08:00 | 2016-08-17 13:10:00 |            0 |  none
|       6 | 2016-08-17 15:00:00 | 2016-08-17 15:10:00 |            0 |  none
|       6 | 2016-08-17 15:10:00 | 2016-08-17 15:15:00 |            1 |  7
|       6 | 2016-08-17 15:15:00 | 2016-08-17 15:20:00 |            2 |  7,8
|       6 | 2016-08-17 15:20:00 | 2016-08-17 15:25:00 |            1 |  8
|       6 | 2016-08-17 15:25:00 | 2016-08-17 15:40:00 |            0 |  none
|       6 | 2016-08-17 15:40:00 | 2016-08-17 15:55:00 |            1 |  7
|       6 | 2016-08-17 15:55:00 | 2016-08-17 16:00:00 |            0 |  none
+---------+---------------------+---------------------+--------------+
like image 96
Vladimir Baranov Avatar answered Oct 30 '22 14:10

Vladimir Baranov


This query will do the calculations:

SELECT TT1.ID1
 , case when TT2.ST2 < TT1.ST1 THEN TT1.ST1 ELSE TT2.ST2 END
 , case when TT2.ET2 > TT1.ET1 THEN TT1.ET1 ELSE TT2.ET2 END
 , COUNT(distinct TT2.id2) 
FROM #Temp1 TT1 INNER JOIN #Temp2 TT2
ON TT1.ET1 > TT2.ST2 AND TT1.ST1 < TT2.ET2 AND TT1.ID1 <> TT2.ID2
GROUP BY TT1.ID1
 , case when TT2.ST2 < TT1.ST1 THEN TT1.ST1 ELSE TT2.ST2 END
 , case when TT2.ET2 > TT1.ET1 THEN TT1.ET1 ELSE TT2.ET2 END

However, the result will include the slots where coaches cal fill in for the full time slot, e.g for the Coach 1 there will be three slots: from 9:00 to 9:30 with substitute coach #2, from 9:30 to 10:00 substitute coach #4 and the timeslot from 9:00 to 10:00 with substitute coaches #3 and #4. Here is the whole result:

ID1                                                         
----------- ----------------------- ----------------------- -----------
1           2016-08-17 09:00:00.000 2016-08-17 09:30:00.000 1
1           2016-08-17 09:00:00.000 2016-08-17 10:00:00.000 2
1           2016-08-17 09:30:00.000 2016-08-17 10:00:00.000 1
3           2016-08-17 09:30:00.000 2016-08-17 10:00:00.000 3
4           2016-08-17 12:00:00.000 2016-08-17 12:30:00.000 1
4           2016-08-17 12:00:00.000 2016-08-17 13:00:00.000 1
like image 28
cha Avatar answered Oct 30 '22 15:10

cha