T-SQL Combine rows in continuation

Question

I have a table that looks like the following. What I want is the the rows in continuation of each other to be grouped together - for each "ID". The column IsContinued marks if the next row should be combined with the current row

My data looks like this:

+-----+--------+-------------+-----------+----------+
| ID  | Period | IsContinued | StartDate | EndDate  |
+-----+--------+-------------+-----------+----------+
| 123 | 1      | 1           | 20180101  | 20180404 |
+-----+--------+-------------+-----------+----------+
| 123 | 2      | 1           | 20180501  | 20180910 |
+-----+--------+-------------+-----------+----------+
| 123 | 3      | 0           | 20181001  | 20181201 |
+-----+--------+-------------+-----------+----------+
| 123 | 4      | 1           | 20190105  | 20190228 |
+-----+--------+-------------+-----------+----------+
| 123 | 5      | 0           | 20190401  | 20190430 |
+-----+--------+-------------+-----------+----------+
| 456 | 2      | 1           | 20180201  | 20180215 |
+-----+--------+-------------+-----------+----------+
| 456 | 3      | 0           | 20180301  | 20180401 |
+-----+--------+-------------+-----------+----------+
| 456 | 4      | 0           | 20180501  | 20180530 |
+-----+--------+-------------+-----------+----------+
| 456 | 5      | 0           | 20180701  | 20180705 |
+-----+--------+-------------+-----------+----------+

The end result I want is this:

+-----+-------------+-----------+-----------+----------+
| ID  | PeriodStart | PeriodEnd | StartDate | EndDate  |
+-----+-------------+-----------+-----------+----------+
| 123 | 1           | 3         | 20180101  | 20181201 |
+-----+-------------+-----------+-----------+----------+
| 123 | 4           | 5         | 20190105  | 20190430 |
+-----+-------------+-----------+-----------+----------+
| 456 | 2           | 3         | 20180201  | 20180401 |
+-----+-------------+-----------+-----------+----------+
| 456 | 4           | 4         | 20180501  | 20180530 |
+-----+-------------+-----------+-----------+----------+
| 456 | 5           | 5         | 20180701  | 20180705 |
+-----+-------------+-----------+-----------+----------+

DDL Statement:

CREATE TABLE #Period (ID INT, PeriodNr INT, IsContinued INT, STARTDATE DATE, ENDDATE DATE)
INSERT INTO #Period VALUES (123,1,1,'20180101', '20180404'),
                      (123,2,1,'20180501', '20180910'),
                      (123,3,0,'20181001', '20181201'),
                      (123,4,1,'20190105', '20190228'),
                      (123,5,0,'20190401', '20190430'),
                      (456,2,1,'20180201', '20180215'),
                      (456,3,0,'20180301', '20180401'),
                      (456,4,0,'20180501', '20180530'),
                      (456,5,0,'20180701', '20180705')

The code should be run on SQL Server 2016

Thanks!

Aaron Dietz · Accepted Answer

Here is one approach:

with removeFluff as
(
SELECT *
FROM (
        SELECT ID, PeriodNr, IsContinued, STARTDATE, ENDDATE, LAG(IsContinued,1,2) OVER (PARTITION BY ID ORDER BY PERIODNR) Lag
        FROM #Period
     ) A
WHERE (IsContinued <> Lag) OR (IsContinued + Lag = 0)
)    
,getValues as
(
SELECT ID,
       CASE WHEN LAG(IsContinued) OVER (PARTITION BY ID ORDER BY PeriodNr) = 1 THEN LAG(PeriodNr) OVER (PARTITION BY ID ORDER BY PeriodNr) ELSE PeriodNr END PeriodStart,
       PeriodNr PeriodEnd,
       CASE WHEN LAG(IsContinued) OVER (PARTITION BY ID ORDER BY PeriodNr) = 1 THEN LAG(STARTDATE) OVER (PARTITION BY ID ORDER BY PeriodNr) ELSE STARTDATE END StartDate,
       EndDate,
       IsContinued
FROM removeFluff r
)

SELECT ID, PeriodStart, PeriodEnd, StartDate, EndDate
FROM getValues
WHERE IsContinued = 0

Output:

ID  PeriodStart PeriodEnd   StartDate   EndDate
123    1           3        2018-01-01  2018-12-01
123    4           5        2019-01-05  2019-04-30
456    2           3        2018-02-01  2018-04-01
456    4           4        2018-05-01  2018-05-30
456    5           5        2018-07-01  2018-07-05

Method:

removeFluff cte removes lines that are unimportant. Theses are the records that don't start or end a segment (line 2 in your sample data)
Now that the fluff is removed, we know that either:
A.) The line is complete on it's own (LAG(IsContinued) ... = 0), ie. previous line is complete
B.) The line needs the "start" info from the previous line (LAG(IsContinued) ... = 1)
We apply these two cases in the CASE expression of the getValues cte
Last, the results are narrowed to only the important rows in the final select with IsContinued = 0. This is because we have used LAG to get "start" data on the "end" data row, so we only want to select the end rows

T-SQL Combine rows in continuation

Tags:

sql

tsql

sql-server-2016

Tiawy

1 Answers

Aaron Dietz

Recent Activity

Donate For Us

T-SQL Combine rows in continuation

Tags:

sql

tsql

sql-server-2016

Tiawy

1 Answers

Aaron Dietz

Related questions

Recent Activity

Donate For Us