Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Combine consecutive date ranges

Using SQL Server 2008 R2,

I'm trying to combine date ranges into the maximum date range given that one end date is next to the following start date.

The data is about different employments. Some employees may have ended their employment and have rejoined at a later time. Those should count as two different employments (example ID 5). Some people have different types of employment, running after each other (enddate and startdate neck-to-neck), in this case it should be considered as one employment in total (example ID 30).

An employment period that has not ended has an enddate that is null.

Some examples is probably enlightening:

declare @t as table  (employmentid int, startdate datetime, enddate datetime)

insert into @t values
(5, '2007-12-03', '2011-08-26'),
(5, '2013-05-02', null),
(30, '2006-10-02', '2011-01-16'),
(30, '2011-01-17', '2012-08-12'),
(30, '2012-08-13', null),
(66, '2007-09-24', null)

-- expected outcome
EmploymentId StartDate   EndDate
5            2007-12-03  2011-08-26
5            2013-05-02  NULL
30           2006-10-02  NULL
66           2007-09-24  NULL

I've been trying different "islands-and-gaps" techniques but haven't been able to crack this one.

like image 609
Jonas Lincoln Avatar asked Apr 03 '13 09:04

Jonas Lincoln


1 Answers

An alternative solution that uses window functions rather than recursive CTEs

SELECT 
    employmentid, 
    MIN(startdate) as startdate, 
    NULLIF(MAX(COALESCE(enddate,'9999-01-01')), '9999-01-01') as enddate
FROM (
    SELECT 
        employmentid, 
        startdate, 
        enddate,
        DATEADD(
            DAY, 
            -COALESCE(
                SUM(DATEDIFF(DAY, startdate, enddate)+1) OVER (PARTITION BY employmentid ORDER BY startdate ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING), 
                0
            ),
            startdate
    ) as grp
    FROM @t
) withGroup
GROUP BY employmentid, grp
ORDER BY employmentid, startdate

This works by calculating a grp value that will be the same for all consecutive rows. This is achieved by:

  1. Determine totals days the span occupies (+1 as the dates are inclusive)
SELECT *, DATEDIFF(DAY, startdate, enddate)+1 as daysSpanned FROM @t
  1. Cumulative sum the days spanned for each employment, ordered by startdate. This gives us the total days spanned by all the previous employment spans
    • We coalesce with 0 to ensure we dont have NULLs in our cumulative sum of days spanned
    • We do not include current row in our cumulative sum, this is because we will use the value against startdate rather than enddate (we cant use it against enddate because of the NULLs)
SELECT *, COALESCE(
    SUM(daysSpanned) OVER (
        PARTITION BY employmentid 
        ORDER BY startdate 
        ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
    )
    ,0
)  as cumulativeDaysSpanned
FROM (
    SELECT *, DATEDIFF(DAY, startdate, enddate)+1 as daysSpanned FROM @t
) inner1
  1. Subtract the cumulative days from the startdate to get our grp. This is the crux of the solution.
    • If the start date increases at the same rate as the days spanned then the days are consecutive, and subtracting the two will give us the same value.
    • If the startdate increases faster than the days spanned then there is a gap and we will get a new grp value greater than the previous one.
    • Although grp is a date, the date itself is meaningless we are using just as a grouping value
SELECT *, DATEADD(DAY, -cumulativeDaysSpanned, startdate) as grp
FROM (
    SELECT *, COALESCE(
        SUM(daysSpanned) OVER (
            PARTITION BY employmentid 
            ORDER BY startdate 
            ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
        )
        ,0
    )  as cumulativeDaysSpanned
    FROM (
        SELECT *, DATEDIFF(DAY, startdate, enddate)+1 as daysSpanned FROM @t
    ) inner1
) inner2

With the results

+--------------+-------------------------+-------------------------+-------------+-----------------------+-------------------------+
| employmentid | startdate               | enddate                 | daysSpanned | cumulativeDaysSpanned | grp                     |
+--------------+-------------------------+-------------------------+-------------+-----------------------+-------------------------+
| 5            | 2007-12-03 00:00:00.000 | 2011-08-26 00:00:00.000 | 1363        | 0                     | 2007-12-03 00:00:00.000 |
+--------------+-------------------------+-------------------------+-------------+-----------------------+-------------------------+
| 5            | 2013-05-02 00:00:00.000 | NULL                    | NULL        | 1363                  | 2009-08-08 00:00:00.000 |
+--------------+-------------------------+-------------------------+-------------+-----------------------+-------------------------+
| 30           | 2006-10-02 00:00:00.000 | 2011-01-16 00:00:00.000 | 1568        | 0                     | 2006-10-02 00:00:00.000 |
+--------------+-------------------------+-------------------------+-------------+-----------------------+-------------------------+
| 30           | 2011-01-17 00:00:00.000 | 2012-08-12 00:00:00.000 | 574         | 1568                  | 2006-10-02 00:00:00.000 |
+--------------+-------------------------+-------------------------+-------------+-----------------------+-------------------------+
| 30           | 2012-08-13 00:00:00.000 | NULL                    | NULL        | 2142                  | 2006-10-02 00:00:00.000 |
+--------------+-------------------------+-------------------------+-------------+-----------------------+-------------------------+
| 66           | 2007-09-24 00:00:00.000 | NULL                    | NULL        | 0                     | 2007-09-24 00:00:00.000 |
+--------------+-------------------------+-------------------------+-------------+-----------------------+-------------------------+
  1. Finally we can GROUP BY grp to get the get rid of the consecutive days.
    • Use MIN and MAX to get the new startdate and endate
    • To handle the NULL enddate we give them a large value to get picked up by MAX then convert them back to NULL again
SELECT 
    employmentid, 
    MIN(startdate) as startdate, 
    NULLIF(MAX(COALESCE(enddate,'9999-01-01')), '9999-01-01') as enddate
FROM (
    SELECT *, DATEADD(DAY, -cumulativeDaysSpanned, startdate) as grp
    FROM (
        SELECT *, COALESCE(
            SUM(daysSpanned) OVER (
                PARTITION BY employmentid 
                ORDER BY startdate 
                ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
            )
            ,0
        )  as cumulativeDaysSpanned
        FROM (
            SELECT *, DATEDIFF(DAY, startdate, enddate)+1 as daysSpanned FROM @t
        ) inner1
    ) inner2
) inner3
GROUP BY employmentid, grp
ORDER BY employmentid, startdate

To get the desired result

+--------------+-------------------------+-------------------------+
| employmentid | startdate               | enddate                 |
+--------------+-------------------------+-------------------------+
| 5            | 2007-12-03 00:00:00.000 | 2011-08-26 00:00:00.000 |
+--------------+-------------------------+-------------------------+
| 5            | 2013-05-02 00:00:00.000 | NULL                    |
+--------------+-------------------------+-------------------------+
| 30           | 2006-10-02 00:00:00.000 | NULL                    |
+--------------+-------------------------+-------------------------+
| 66           | 2007-09-24 00:00:00.000 | NULL                    |
+--------------+-------------------------+-------------------------+
  1. We can combine the inner queries to get the query at the start of this answer. Which is shorter, but less explainable

Limitations of all this required that

  • there are no overlaps of startdate and enddate for an employment. This could produce collisions in our grp.
  • startdate is not NULL. However this could be overcome by replacing NULL start dates with small date values
  • Future developers can decipher the window black magic you performed
like image 90
Alex Avatar answered Sep 18 '22 21:09

Alex