Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

T-SQL Combine rows in continuation

I have a table that looks like the following. What I want is the the rows in continuation of each other to be grouped together - for each "ID". The column IsContinued marks if the next row should be combined with the current row

My data looks like this:

+-----+--------+-------------+-----------+----------+
| ID  | Period | IsContinued | StartDate | EndDate  |
+-----+--------+-------------+-----------+----------+
| 123 | 1      | 1           | 20180101  | 20180404 |
+-----+--------+-------------+-----------+----------+
| 123 | 2      | 1           | 20180501  | 20180910 |
+-----+--------+-------------+-----------+----------+
| 123 | 3      | 0           | 20181001  | 20181201 |
+-----+--------+-------------+-----------+----------+
| 123 | 4      | 1           | 20190105  | 20190228 |
+-----+--------+-------------+-----------+----------+
| 123 | 5      | 0           | 20190401  | 20190430 |
+-----+--------+-------------+-----------+----------+
| 456 | 2      | 1           | 20180201  | 20180215 |
+-----+--------+-------------+-----------+----------+
| 456 | 3      | 0           | 20180301  | 20180401 |
+-----+--------+-------------+-----------+----------+
| 456 | 4      | 0           | 20180501  | 20180530 |
+-----+--------+-------------+-----------+----------+
| 456 | 5      | 0           | 20180701  | 20180705 |
+-----+--------+-------------+-----------+----------+

The end result I want is this:

+-----+-------------+-----------+-----------+----------+
| ID  | PeriodStart | PeriodEnd | StartDate | EndDate  |
+-----+-------------+-----------+-----------+----------+
| 123 | 1           | 3         | 20180101  | 20181201 |
+-----+-------------+-----------+-----------+----------+
| 123 | 4           | 5         | 20190105  | 20190430 |
+-----+-------------+-----------+-----------+----------+
| 456 | 2           | 3         | 20180201  | 20180401 |
+-----+-------------+-----------+-----------+----------+
| 456 | 4           | 4         | 20180501  | 20180530 |
+-----+-------------+-----------+-----------+----------+
| 456 | 5           | 5         | 20180701  | 20180705 |
+-----+-------------+-----------+-----------+----------+

DDL Statement:

CREATE TABLE #Period (ID INT, PeriodNr INT, IsContinued INT, STARTDATE DATE, ENDDATE DATE)
INSERT INTO #Period VALUES (123,1,1,'20180101', '20180404'),
                      (123,2,1,'20180501', '20180910'),
                      (123,3,0,'20181001', '20181201'),
                      (123,4,1,'20190105', '20190228'),
                      (123,5,0,'20190401', '20190430'),
                      (456,2,1,'20180201', '20180215'),
                      (456,3,0,'20180301', '20180401'),
                      (456,4,0,'20180501', '20180530'),
                      (456,5,0,'20180701', '20180705')

The code should be run on SQL Server 2016

Thanks!

like image 263
Tiawy Avatar asked Sep 06 '18 12:09

Tiawy


1 Answers

Here is one approach:

with removeFluff as
(
SELECT *
FROM (
        SELECT ID, PeriodNr, IsContinued, STARTDATE, ENDDATE, LAG(IsContinued,1,2) OVER (PARTITION BY ID ORDER BY PERIODNR) Lag
        FROM #Period
     ) A
WHERE (IsContinued <> Lag) OR (IsContinued + Lag = 0)
)    
,getValues as
(
SELECT ID,
       CASE WHEN LAG(IsContinued) OVER (PARTITION BY ID ORDER BY PeriodNr) = 1 THEN LAG(PeriodNr) OVER (PARTITION BY ID ORDER BY PeriodNr) ELSE PeriodNr END PeriodStart,
       PeriodNr PeriodEnd,
       CASE WHEN LAG(IsContinued) OVER (PARTITION BY ID ORDER BY PeriodNr) = 1 THEN LAG(STARTDATE) OVER (PARTITION BY ID ORDER BY PeriodNr) ELSE STARTDATE END StartDate,
       EndDate,
       IsContinued
FROM removeFluff r
)

SELECT ID, PeriodStart, PeriodEnd, StartDate, EndDate
FROM getValues
WHERE IsContinued = 0

Output:

ID  PeriodStart PeriodEnd   StartDate   EndDate
123    1           3        2018-01-01  2018-12-01
123    4           5        2019-01-05  2019-04-30
456    2           3        2018-02-01  2018-04-01
456    4           4        2018-05-01  2018-05-30
456    5           5        2018-07-01  2018-07-05

Method:

  • removeFluff cte removes lines that are unimportant. Theses are the records that don't start or end a segment (line 2 in your sample data)
  • Now that the fluff is removed, we know that either:
  • A.) The line is complete on it's own (LAG(IsContinued) ... = 0), ie. previous line is complete
  • B.) The line needs the "start" info from the previous line (LAG(IsContinued) ... = 1)
  • We apply these two cases in the CASE expression of the getValues cte
  • Last, the results are narrowed to only the important rows in the final select with IsContinued = 0. This is because we have used LAG to get "start" data on the "end" data row, so we only want to select the end rows
like image 124
Aaron Dietz Avatar answered Nov 03 '22 11:11

Aaron Dietz