Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL - Merging two tables based off of two different time lines

I'm struggling to come up with a way to join/merge two tables based off date ranges to create a single table with one time line.

Stripped down versions of the tables with some sample data:

Table 1
---------------------------------
ID    Start      End       State
1     10:00      10:05     A
2     10:23      10:24     B
3     10:32      10:40     A
4     14:00      14:15     C


Table 2
---------------------------------
ID    Start      End       State
1     10:22      10:27     X
2     11:00      11:20     Y
3     12:05      14:30     Z

The two tables are maintained by two different processes, and while within each individual table events will never overlap, there could be overlap between the two tables.

The events in the first table should take precedence over those in the second. That is, basically select everything from the first and fill in the gaps with the second.

Based on the example above, the expected query output would be:

Start      End       State
10:00      10:05     A
10:22      10:23     X
10:23      10:24     B
10:24      10:27     X
10:32      10:40     A
11:00      11:20     Y
12:05      14:00     Z
14:00      14:15     C
14:15      14:30     Z

Notice how state Z from Table 2 is split up by state C from Table 1.

I saw a few examples where tables were joined based off of single time stamps, or possibly one table with a date range and another with a single time stamp. I've yet to see something of this nature, and as my experience level is fairly basic my thoughts are going in circles.

Thanks for any advice, and if I come up with any breakthroughs I will be sure to update this.

UPDATE

Thanks to Gordon, here is the solution I'm using (very slightly modified version of what he had):

---------------------------------------------------------------------------------------------
-- Setup Table 1                                                                           --
---------------------------------------------------------------------------------------------
DECLARE @Table1 TABLE(ID INT, Start DATETIMEOFFSET(7), [End] DATETIMEOFFSET(7), [State] CHAR)

INSERT  @Table1
VALUES  (1, '2013-12-21 10:00:00 +00:00', '2013-12-21 10:05:00 +00:00', 'A'),
        (2, '2013-12-21 10:23:00 +00:00', '2013-12-21 10:24:00 +00:00', 'B'),
        (3, '2013-12-21 10:32:00 +00:00', '2013-12-21 10:40:00 +00:00', 'A'),
        (4, '2013-12-21 14:00:00 +00:00', '2013-12-21 14:15:00 +00:00', 'C')

SELECT * FROM @Table1

---------------------------------------------------------------------------------------------
-- Setup Table 2                                                                           --
---------------------------------------------------------------------------------------------
DECLARE @Table2 TABLE (ID INT, Start DATETIMEOFFSET(7), [End] DATETIMEOFFSET(7), [State] CHAR)

INSERT  @Table2
VALUES  (1, '2013-12-21 10:22:00 +00:00', '2013-12-21 10:27:00 +00:00', 'X'),
        (2, '2013-12-21 11:00:00 +00:00', '2013-12-21 11:20:00 +00:00', 'Y'),
        (3, '2013-12-21 12:05:00 +00:00', '2013-12-21 14:30:00 +00:00', 'Z')

SELECT * FROM @Table2

---------------------------------------------------------------------------------------------
-- Merge Tables                                                                            --
---------------------------------------------------------------------------------------------
;WITH StateChangeTimes AS (
    SELECT  DISTINCT TheTime
    FROM    (SELECT T1.Start AS TheTime, T1.[State]
             FROM   @Table1 T1
             UNION  ALL
             SELECT T1.[End], NULL
             FROM   @Table1 T1
             UNION  ALL
             SELECT T2.Start, T2.[State]
             FROM   @Table2 T2
             UNION  ALL
             SELECT T2.[End], NULL
             FROM   @Table2 T2) T ),
TimePairs AS (
    SELECT   TheTime AS Start,
            (SELECT MIN(SCT2.TheTime)
             FROM   StateChangeTimes SCT2
             WHERE  SCT2.thetime > SCT.TheTime) AS [End]
    FROM     StateChangeTimes SCT)
SELECT  Start,
        [End],
        COALESCE(T1State, T2State) AS [State]
FROM    (SELECT Start,
                [End],
                (SELECT TOP 1
                        T1.[State]
                 FROM   @Table1 T1
                 WHERE  TP.Start >= T1.Start AND TP.[End] <= T1.[End]
                 ORDER BY T1.Start DESC) T1State,
                (SELECT TOP 1
                        T2.[State]
                 FROM   @Table2 T2
                 WHERE  TP.Start >= T2.Start AND TP.[End] <= T2.[End]
                 ORDER BY T2.Start DESC) T2State
         FROM TimePairs TP) TP2
ORDER BY Start;

The main changes I made:

  • In the where clauses for the select statements that grabs the states that get coalesced, I made the End time stamps inclusive.

  • Added in a TOP 1 qualifier to the first state select statement as mentioned in the first bullet point.

like image 408
manderson Avatar asked Oct 20 '22 18:10

manderson


1 Answers

I think this is a hard problem.

Here is the way that I'm thinking about it. Get a list of all the start and end times. This provides the "boundaries" for the different time periods. Next, lookup the state during each time period.

The rule of the state for a given time period is:

  1. if the time period start is the start from Table1, then use that state.
  2. if the time period start is greater than the start from Table1 and less then the end, then use the state from Table1. This is part of the precedence rule and occurs when a table2 state overlaps with a table1 state.
  3. otherwise use the state from Table2.

(Rule 2 actually encompasses rule 1.)

The next problem is implementing this in SQL. The approach is to get the individual time periods, combine them into pairs and then do the lookup. Because SQL Server 2008 lacks functions like lag() and cumulative sums (which are in SQL Server 2012), the code uses correlated subqueries instead.

with StateChangeTimes as (
      select distinct thetime
      from (select start as thetime, state
            from Table1 t1
            union all
            select end, NULL
            from Table1 t1
            union all
            select start, state
            from Table2 t2
            union all
            select end, NULL
            from Table2 t2
           ) t
    ),
     timepairs as (
     select thetime as start,
            (select min(thetime)
             from StateChangeTimes sct2
             where sct2.thetime > sct.thetime
            ) as end
     from StateChangeTimes sct 
    )
select start, end, coalesce(t1State, t2State) as state
from (select start, end,
             (select t1.state
              from Table1 t1
              where tp.start >= t1.start and tp.end < t1.end
             ) t1State,
             (select t2.state
              from Table2 t2
              where t2.start <= tp.start
              order by t2.start desc
             ) t2State
      from timepairs tp
     ) tp
order by start;

I am not sure if this code will work when the same time appears in both tables. Also, it will generate an error if Table1 times overlap. This is fixed relatively easily, but it did not seem to be a requirement.

like image 191
Gordon Linoff Avatar answered Oct 24 '22 00:10

Gordon Linoff