Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Recursive CTE - consolidate start and end dates

I have the following table:

row_num customer_status    effective_from_datetime
------- ------------------ -----------------------
1       Active             2011-01-01
2       Active             2011-01-02
3       Active             2011-01-03
4       Suspended          2011-01-04
5       Suspended          2011-01-05
6       Active             2011-01-06

And am trying to achieve the following result whereby consecutive rows with the same status are merged into one row with an effective from and to date range:

customer_status effective_from_datetime effective_to_datetime
--------------- ----------------------- ---------------------
Active          2011-01-01              2011-01-04
Suspended       2011-01-04              2011-01-06
Active          2011-01-06              NULL

I can get a recursive CTE to output the correct effective_to_datetime based on the next row, but am having trouble merging the ranges.

Code to generate sample data:

CREATE TABLE #temp
(
row_num INT IDENTITY(1,1),
customer_status VARCHAR(10),
effective_from_datetime DATE
)

INSERT INTO #temp
VALUES 
('Active','2011-01-01')
,('Active','2011-01-02')
,('Active','2011-01-03')
,('Suspended','2011-01-04')
,('Suspended','2011-01-05')
,('Active','2011-01-06')
like image 590
shakedown7 Avatar asked Dec 16 '11 15:12

shakedown7


People also ask

How does a recursive CTE work?

A recursive CTE references itself. It returns the result subset, then it repeatedly (recursively) references itself, and stops when it returns all the results.

What is the difference between CTE and recursive CTE?

A CTE can be recursive or non-recursive. A recursive CTE is a CTE that references itself. A recursive CTE can join a table to itself as many times as necessary to process hierarchical data in the table. CTEs increase modularity and simplify maintenance.

How do I limit recursion on CTE?

You can define the maximum number of recursions for CTE, using the MAXRECURSION option. Set the value of MAXRECURSION to 0, if you don't know the exact numbers of recursions.

Can CTE be recursive till what level it can be nested?

And these recursive functions or stored procedures support only up-to 32 levels of recursion. By default CTEs support a maximum recursion level of 100. CTEs also provide an option to set a MAXRECURSION level value between 0 to 32,767.


1 Answers

EDIT SQL updated as per comment.

WITH
  group_assigned_data AS
(
  SELECT
    ROW_NUMBER() OVER (PARTITION BY customer_status ORDER BY effective_from_date) AS status_sequence_id,
    ROW_NUMBER() OVER (                             ORDER BY effective_from_date) AS sequence_id,
    customer_status,
    effective_from_date
  FROM
    your_table
)
,
  grouped_data AS
(
  SELECT
    customer_status,
    MIN(effective_from_date)   AS min_effective_from_date,
    MAX(effective_from_date)   AS max_effective_from_date
  FROM
    group_assigned_data
  GROUP BY
    customer_status,
    sequence_id - status_sequence_id
)
SELECT
  [current].customer_status,
  [current].min_effective_from_date       AS effective_from,
  [next].min_effective_from_date          AS effective_to
FROM
  grouped_data   AS [current]
LEFT JOIN
  grouped_data   AS [next]
    ON [current].max_effective_from_date = [next].min_effective_from_date + 1
ORDER BY
  [current].min_effective_from_date

This isn't recursive, but that's possibly a good thing.


It doesn't deal with gaps in your data. To deal with that you could create a calendar table, with every relevant date, and join on that to fill missing dates with 'unknown' status, and then run the query against that. (Infact you cate do it it a CTE that is used by the CTE above).

At present...
- If row 2 was missing, it would not change the result
- If row 3 was missing, the end_date of the first row would change

Different behaviour can be determined by preparing your data, or other methods. We'd need to know the business logic you need though.


If any one date can have multiple status entries, you need to define what logic you want it to follow. At present the behaviour is undefined, but you could correct that as simply as adding customer_status to the ORDER BY portions of ROW_NUMBER().

like image 76
MatBailie Avatar answered Nov 07 '22 17:11

MatBailie