Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL to find time elapsed from multiple overlapping intervals

Not using MSSQL or DB2 or Oracle. No CTE. No OVERLAP predicate. No INTERVAL data type. The situation: on a vehicle to be repaired work can not start until all parts ordered for the job have been received. Parts may be ordered multiple times prior to the start of repair. We need to extract the time for which the vehicle was on "parts hold"

So for a vehicle identified as id = 1 parts were ordered (d1) and received (d2) on 4 different occasions

    ID     d1     d2
     1     8/1    8/8
     1     8/2    8/6
     1     8/12   8/14
     1     8/3    8/10

 8/1                             8/8
  d1                              d2   
  |-------------------------------|  
         8/2             8/6                    8/12      8/14                  
         d1               d2                     d1        d2     
          |---------------|                      |----------|    
                   8/3                 8/10
                   d1                    d2
                   |---------------------|   
 8/1                                                       8/14
  |---------------------------------------------------------|  = 13 days
                                        8/10    8/12
  |--------------------------------------|    +  |----------|  = parts hold  = 11 days

As seen from above, the wait time to start work (assuming 8/1 as the date from which the vehicle was available for work) was 13 days. The actual time spent waiting for parts was 11 days, which is the number we need to derive from the data. The actual datetime data will be timestamps from which we will extract hours, we used dates in this sample data for simplicity of presentation. We are struggling to generate a set (not psm, not udf, not cursor) based solution. TIA

like image 256
jon Avatar asked Aug 28 '11 23:08

jon


2 Answers

This SQL statement seems to get what you want (t is the table name of the sampe table):

SELECT
   d.id, 
   d.duration, 
   d.duration - 
   IFNULL(
      ( SELECT Sum( timestampdiff( SQL_TSI_DAY, 
                                   no_hold.d2, 
                                   ( SELECT min(d1) FROM t t4 
                                     WHERE t4.id = no_hold.id and t4.d1 > no_hold.d2 )))
        FROM ( SELECT DISTINCT id, d2 FROM t t1 
               WHERE ( SELECT sum( IIF( t1.d2 between t2.d1 and t2.d2, 1, 0 ) ) 
                       FROM t t2 WHERE t2.id = t1.id and t2.d2 <> t1.d2 ) = 0 
             And d2 <> ( select max( d2 ) from t t3 where t3.id = t1.id )) no_hold
        WHERE no_hold.id = d.id ),
      0 ) "parts hold"
FROM 
   ( SELECT id, timestampdiff( SQL_TSI_DAY, min( d1 ), max( d2 ) ) duration
     FROM t GROUP BY id ) d

The outer query gets the duration of the repair work. The complex subquery calculates the total number of days not waiting for parts. This is done by locating the start dates where the vehicle is not waiting for parts, and then count the number of days until it begins to wait for parts again:

// 1) The query for finding the starting dates when the vehicle is not waiting for parts, 
// i.e. finding all d2 that is not within any date range where the vehicle is waiting for part.
// The DISTINCT is needed to removed duplicate starting "no hold" period.

SELECT DISTINCT id, d2 
FROM t t1
WHERE ( SELECT sum( IIF( t1.d2 between t2.d1 and t2.d2, 1, 0 ) ) from t t2 
        WHERE t2.id = t1.id and t2.d2 <> t1.d2 ) = 0 AND 
      d2 <> ( SELECT max( d2 ) FROM t t3 WHERE t3.id = t1.id ) )

// 2) The days where it vehicle is not waiting for part is the date from the above query till the vehicle is // waiting for part again

timestampdiff( SQL_TSI_DAY, no_hold.d2, ( SELECT min(d1) FROM t t4 WHERE t4.id = no_hold.id and t4.d1 > no_hold.d2 ) )

Combining the two above and aggregating all such periods gives the number of days that the vehicle is not waiting for parts. The final query adds an extra condition to calculate result for each id from the outer query.

This probably is not terribly efficient on very large table with many ids. It should fine if the id is limited to one or just a few.

like image 175
Alex W Avatar answered Oct 05 '22 10:10

Alex W


I couldn't get @Alex W's queries to work. It is not standard SQL, so it required a lot of rewrite to be compatible with SQL Server (which I can test). But it did give me some inspiration, which I have expanded upon.


Find all start-points of every period of uninterrupted waiting:

SELECT DISTINCT
    t1.ID,
    t1.d1 AS date,
    -DATEDIFF(DAY, (SELECT MIN(d1) FROM Orders), t1.d1) AS n
FROM Orders t1
LEFT JOIN Orders t2                   -- Join for any events occurring while this
    ON t2.ID = t1.ID                  -- is starting. If this is a start point,
    AND t2.d1 <> t1.d1                -- it won't match anything, which is what
    AND t1.d1 BETWEEN t2.d1 AND t2.d2 -- we want.
GROUP BY t1.ID, t1.d1, t1.d2
HAVING COUNT(t2.ID) = 0

And the equivalent for end-points:

SELECT DISTINCT
    t1.ID,
    t1.d2 AS date,
    DATEDIFF(DAY, (SELECT MIN(d1) FROM Orders), t1.d2) AS n
FROM Orders t1
LEFT JOIN Orders t2
    ON t2.ID = t1.ID
    AND t2.d2 <> t1.d2
    AND t1.d2 BETWEEN t2.d1 AND t2.d2
GROUP BY t1.ID, t1.d1, t1.d2
HAVING COUNT(t2.ID) = 0

n is the number of days since some common point in time. Start-points have a negative value, and end-points have a positive value. This is so that we can just add them up to get the number of days in between.

span = end - start
span = end + (-start)
span1 + span2 = end1 + (-start1) + end2 + (-start2)

Finally, we just need to add things up:

SELECT ID, SUM(n) AS hold_days
FROM (
   SELECT DISTINCT
       t1.id,
       t1.d1 AS date,
       -DATEDIFF(DAY, (SELECT MIN(d1) FROM Orders), t1.d1)  AS n
   FROM Orders t1
   LEFT JOIN Orders t2
      ON t2.ID = t1.ID
      AND t2.d1 <> t1.d1
      AND t1.d1 BETWEEN t2.d1 AND t2.d2
   GROUP BY t1.ID, t1.d1, t1.d2
   HAVING COUNT(t2.ID) = 0
   UNION ALL
   SELECT DISTINCT
       t1.id,
       t1.d2 AS date,
       DATEDIFF(DAY, (SELECT MIN(d1) FROM Orders), t1.d2) AS n
   FROM Orders t1
   LEFT JOIN Orders t2
      ON t2.ID = t1.ID
      AND t2.d2 <> t1.d2
      AND t1.d2 BETWEEN t2.d1 AND t2.d2
   GROUP BY t1.ID, t1.d1, t1.d2
   HAVING COUNT(t2.ID) = 0
   ORDER BY ID, date
) s
GROUP BY ID;

Input table (Orders):

ID   d1           d2
 1   2011-08-01   2011-08-08
 1   2011-08-02   2011-08-06
 1   2011-08-03   2011-08-10
 1   2011-08-12   2011-08-14
 2   2011-08-01   2011-08-03
 2   2011-08-02   2011-08-06
 2   2011-08-05   2011-08-09

Output:

ID   hold_days
 1          11
 2           8

Alternatively, you can do this with a stored procedure.

CREATE PROCEDURE CalculateHoldTimes
    @ID int = 0
AS
BEGIN
    DECLARE Events CURSOR FOR
    SELECT *
    FROM (
        SELECT d1 AS date, 1 AS diff
        FROM Orders
        WHERE ID = @ID
        UNION ALL
        SELECT d2 AS date, -1 AS diff
        FROM Orders
        WHERE ID = @ID
    ) s
    ORDER BY date;

    DECLARE @Events_date date,
            @Events_diff int,
            @Period_start date,
            @Period_accum int,
            @Total_start date,
            @Total_count int;

    OPEN Events;

    FETCH NEXT FROM Events
    INTO @Events_date, @Events_diff;

    SET @Period_start = @Events_date;
    SET @Period_accum = 0;
    SET @Total_start = @Events_date;
    SET @Total_count = 0;

    WHILE @@FETCH_STATUS = 0
    BEGIN
        SET @Period_accum = @Period_accum + @Events_diff;

        IF @Period_accum = 1 AND @Events_diff = 1
            -- Start of period
            SET @Period_start = @Events_date;
        ELSE IF @Period_accum = 0 AND @Events_diff = -1
            -- End of period
            SET @Total_count = @Total_count +
                DATEDIFF(day, @Period_start, @Events_date);

        FETCH NEXT FROM Events
        INTO @Events_date, @Events_diff;
    END;

    SELECT
        @Total_start AS d1,
        @Events_date AS d2,
        @Total_count AS hold_time;
END;

Call it with:

EXEC CalculateHoldTimes 1;
like image 43
Markus Jarderot Avatar answered Oct 05 '22 12:10

Markus Jarderot