Is there a better way of merging overlapping date intervals?
The solution I came up with is so simple that now I wonder if someone else has a better idea of how this could be done.
/***** DATA EXAMPLE *****/ DECLARE @T TABLE (d1 DATETIME, d2 DATETIME) INSERT INTO @T (d1, d2) SELECT '2010-01-01','2010-03-31' UNION SELECT '2010-04-01','2010-05-31' UNION SELECT '2010-06-15','2010-06-25' UNION SELECT '2010-06-26','2010-07-10' UNION SELECT '2010-08-01','2010-08-05' UNION SELECT '2010-08-01','2010-08-09' UNION SELECT '2010-08-02','2010-08-07' UNION SELECT '2010-08-08','2010-08-08' UNION SELECT '2010-08-09','2010-08-12' UNION SELECT '2010-07-04','2010-08-16' UNION SELECT '2010-11-01','2010-12-31' UNION SELECT '2010-03-01','2010-06-13' /***** INTERVAL ANALYSIS *****/ WHILE (1=1) BEGIN UPDATE t1 SET t1.d2 = t2.d2 FROM @T AS t1 INNER JOIN @T AS t2 ON DATEADD(day, 1, t1.d2) BETWEEN t2.d1 AND t2.d2 IF @@ROWCOUNT = 0 BREAK END /***** RESULT *****/ SELECT StartDate = MIN(d1) , EndDate = d2 FROM @T GROUP BY d2 ORDER BY StartDate, EndDate /***** OUTPUT *****/ /***** StartDate EndDate 2010-01-01 2010-06-13 2010-06-15 2010-08-16 2010-11-01 2010-12-31 *****/
A simple approach is to start from the first interval and compare it with all other intervals for overlapping, if it overlaps with any other interval, then remove the other interval from the list and merge the other into the first interval. Repeat the same steps for the remaining intervals after the first.
Let's take the following overlapping intervals example to explain the idea: If both ranges have at least one common point, then we say that they're overlapping. In other words, we say that two ranges and are overlapping if: On the other hand, non-overlapping ranges don't have any points in common.
I was looking for the same solution and came across this post on Combine overlapping datetime to return single overlapping range record.
There is another thread on Packing Date Intervals.
I tested this with various date ranges, including the ones listed here, and it works correctly every time.
SELECT s1.StartDate, --t1.EndDate MIN(t1.EndDate) AS EndDate FROM @T s1 INNER JOIN @T t1 ON s1.StartDate <= t1.EndDate AND NOT EXISTS(SELECT * FROM @T t2 WHERE t1.EndDate >= t2.StartDate AND t1.EndDate < t2.EndDate) WHERE NOT EXISTS(SELECT * FROM @T s2 WHERE s1.StartDate > s2.StartDate AND s1.StartDate <= s2.EndDate) GROUP BY s1.StartDate ORDER BY s1.StartDate
The result is:
StartDate | EndDate 2010-01-01 | 2010-06-13 2010-06-15 | 2010-06-25 2010-06-26 | 2010-08-16 2010-11-01 | 2010-12-31
You asked this back in 2010 but don't specify any particular version.
An answer for people on SQL Server 2012+
WITH T1 AS (SELECT *, MAX(d2) OVER (ORDER BY d1) AS max_d2_so_far FROM @T), T2 AS (SELECT *, CASE WHEN d1 <= DATEADD(DAY, 1, LAG(max_d2_so_far) OVER (ORDER BY d1)) THEN 0 ELSE 1 END AS range_start FROM T1), T3 AS (SELECT *, SUM(range_start) OVER (ORDER BY d1) AS range_group FROM T2) SELECT range_group, MIN(d1) AS d1, MAX(d2) AS d2 FROM T3 GROUP BY range_group
Which returns
+-------------+------------+------------+ | range_group | d1 | d2 | +-------------+------------+------------+ | 1 | 2010-01-01 | 2010-06-13 | | 2 | 2010-06-15 | 2010-08-16 | | 3 | 2010-11-01 | 2010-12-31 | +-------------+------------+------------+
DATEADD(DAY, 1
is used because your desired results show you want a period ending on 2010-06-25
to be collapsed into one starting 2010-06-26
. For other use cases this may need adjusting.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With