I'm struggling to come up with a way to join/merge two tables based off date ranges to create a single table with one time line.
Stripped down versions of the tables with some sample data:
Table 1
---------------------------------
ID Start End State
1 10:00 10:05 A
2 10:23 10:24 B
3 10:32 10:40 A
4 14:00 14:15 C
Table 2
---------------------------------
ID Start End State
1 10:22 10:27 X
2 11:00 11:20 Y
3 12:05 14:30 Z
The two tables are maintained by two different processes, and while within each individual table events will never overlap, there could be overlap between the two tables.
The events in the first table should take precedence over those in the second. That is, basically select everything from the first and fill in the gaps with the second.
Based on the example above, the expected query output would be:
Start End State
10:00 10:05 A
10:22 10:23 X
10:23 10:24 B
10:24 10:27 X
10:32 10:40 A
11:00 11:20 Y
12:05 14:00 Z
14:00 14:15 C
14:15 14:30 Z
Notice how state Z from Table 2 is split up by state C from Table 1.
I saw a few examples where tables were joined based off of single time stamps, or possibly one table with a date range and another with a single time stamp. I've yet to see something of this nature, and as my experience level is fairly basic my thoughts are going in circles.
Thanks for any advice, and if I come up with any breakthroughs I will be sure to update this.
UPDATE
Thanks to Gordon, here is the solution I'm using (very slightly modified version of what he had):
---------------------------------------------------------------------------------------------
-- Setup Table 1 --
---------------------------------------------------------------------------------------------
DECLARE @Table1 TABLE(ID INT, Start DATETIMEOFFSET(7), [End] DATETIMEOFFSET(7), [State] CHAR)
INSERT @Table1
VALUES (1, '2013-12-21 10:00:00 +00:00', '2013-12-21 10:05:00 +00:00', 'A'),
(2, '2013-12-21 10:23:00 +00:00', '2013-12-21 10:24:00 +00:00', 'B'),
(3, '2013-12-21 10:32:00 +00:00', '2013-12-21 10:40:00 +00:00', 'A'),
(4, '2013-12-21 14:00:00 +00:00', '2013-12-21 14:15:00 +00:00', 'C')
SELECT * FROM @Table1
---------------------------------------------------------------------------------------------
-- Setup Table 2 --
---------------------------------------------------------------------------------------------
DECLARE @Table2 TABLE (ID INT, Start DATETIMEOFFSET(7), [End] DATETIMEOFFSET(7), [State] CHAR)
INSERT @Table2
VALUES (1, '2013-12-21 10:22:00 +00:00', '2013-12-21 10:27:00 +00:00', 'X'),
(2, '2013-12-21 11:00:00 +00:00', '2013-12-21 11:20:00 +00:00', 'Y'),
(3, '2013-12-21 12:05:00 +00:00', '2013-12-21 14:30:00 +00:00', 'Z')
SELECT * FROM @Table2
---------------------------------------------------------------------------------------------
-- Merge Tables --
---------------------------------------------------------------------------------------------
;WITH StateChangeTimes AS (
SELECT DISTINCT TheTime
FROM (SELECT T1.Start AS TheTime, T1.[State]
FROM @Table1 T1
UNION ALL
SELECT T1.[End], NULL
FROM @Table1 T1
UNION ALL
SELECT T2.Start, T2.[State]
FROM @Table2 T2
UNION ALL
SELECT T2.[End], NULL
FROM @Table2 T2) T ),
TimePairs AS (
SELECT TheTime AS Start,
(SELECT MIN(SCT2.TheTime)
FROM StateChangeTimes SCT2
WHERE SCT2.thetime > SCT.TheTime) AS [End]
FROM StateChangeTimes SCT)
SELECT Start,
[End],
COALESCE(T1State, T2State) AS [State]
FROM (SELECT Start,
[End],
(SELECT TOP 1
T1.[State]
FROM @Table1 T1
WHERE TP.Start >= T1.Start AND TP.[End] <= T1.[End]
ORDER BY T1.Start DESC) T1State,
(SELECT TOP 1
T2.[State]
FROM @Table2 T2
WHERE TP.Start >= T2.Start AND TP.[End] <= T2.[End]
ORDER BY T2.Start DESC) T2State
FROM TimePairs TP) TP2
ORDER BY Start;
The main changes I made:
In the where clauses for the select statements that grabs the states that get coalesced, I made the End time stamps inclusive.
Added in a TOP 1 qualifier to the first state select statement as mentioned in the first bullet point.
I think this is a hard problem.
Here is the way that I'm thinking about it. Get a list of all the start and end times. This provides the "boundaries" for the different time periods. Next, lookup the state during each time period.
The rule of the state for a given time period is:
(Rule 2 actually encompasses rule 1.)
The next problem is implementing this in SQL. The approach is to get the individual time periods, combine them into pairs and then do the lookup. Because SQL Server 2008 lacks functions like lag()
and cumulative sums (which are in SQL Server 2012), the code uses correlated subqueries instead.
with StateChangeTimes as (
select distinct thetime
from (select start as thetime, state
from Table1 t1
union all
select end, NULL
from Table1 t1
union all
select start, state
from Table2 t2
union all
select end, NULL
from Table2 t2
) t
),
timepairs as (
select thetime as start,
(select min(thetime)
from StateChangeTimes sct2
where sct2.thetime > sct.thetime
) as end
from StateChangeTimes sct
)
select start, end, coalesce(t1State, t2State) as state
from (select start, end,
(select t1.state
from Table1 t1
where tp.start >= t1.start and tp.end < t1.end
) t1State,
(select t2.state
from Table2 t2
where t2.start <= tp.start
order by t2.start desc
) t2State
from timepairs tp
) tp
order by start;
I am not sure if this code will work when the same time appears in both tables. Also, it will generate an error if Table1 times overlap. This is fixed relatively easily, but it did not seem to be a requirement.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With