Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to properly apply recursive CTE?

I have a table that contains hospital visits for patients. I am trying to flag visits in which a visits' begin_date overlaps the previous visits' end_date + 90 days. However, the caveat to this is that once a visit is flagged as an overlap visit, that visit should not be used to assess an overlap with another visit. Let me explain with an example.

Table

visitID     patientid    begin_date  end_date
1           23           1/12/2018   1/14/2018
2           23           1/30/2018   2/14/2018
3           23           4/20/2018   4/22/2018
4           23           5/02/2018   5/03/2018
5           23           7/23/2018   7/28/2018

In the example above, the patient had 5 visits. Visit 2's begin_date was in range of visit 1's end_date + 90 days, so visit 2 should be flagged. Once visit 2 is flagged, that row should not be used in the analysis for any future visits. Conceptually, it would be like removing visit 2 and beginning the analysis again.

interim stage (visit 2 is removed, and analysis begins again)

visitID     patientid    begin_date  end_date
1           23           1/12/2018   1/14/2018
3           23           4/20/2018   4/22/2018
4           23           5/02/2018   5/03/2018
5           23           7/23/2018   7/28/2018

So even though visit 3 overlaps with visit 2, since visit 2 has been removed, visit 3 will not be flagged as the previous visit (now visit 1) is more than end_date + 90 days away from visit 3's begin_date. Then, visit 4 should be flagged as it overlaps with a visit that was not flagged (visit 3). So since visit 4 is flagged, then visit 5 will be removed as it's begin_date is in the range of visit 3's end_date + 90 days.

Anticipated output

visitID     patientid    begin_date  end_date    flag
1           23           1/12/2018   1/14/2018   0
2           23           1/30/2018   2/14/2018   1
3           23           4/20/2018   4/22/2018   0
4           23           5/02/2018   5/03/2018   1
5           23           7/23/2018   7/28/2018   1

@gordonlinoff answered a very similar question here, but I am running into issues using recursive CTEs. The difference between the questions is that this question needs to reference another column (end_date), rather than a single date column. Recursive CTEs are still a new concept to me, but I hope this will help solidify the concept.

My attempt to solve this puzzle (piggy backing off of @gordonlinoff):

with vt as (
          select vt.*, row_number() over (partition by patientid order by begin_date) as seqnum
          from visits_table vt
         ),
         cte as (
          select vt.visit, vt.patientid, vt.begin_date, vt.end_date, vt.begin_date as first_begin_date, seqnum
          from vt
          where seqnum = 1
          union all
          select vt.visit, vt.patientid, vt.begin_date, vt.end_date,
                 (case when vt.begin_date > dateadd(day, 90, cte.end_date) then vt.begin_date else cte.end_date end),
                 vt.seqnum
          from cte join
               vt
               on vt.seqnum = cte.seqnum + 1 and vt.patientid = cte.patientid
         )
    select cte.visit, cte.patientid, cte.begin_date, cte.end_date,
           (case when first_begin_date = begin_date then 0 else 1 end) as flag
    from cte
    order by cte.patientid, cte.begin_date;

My edits are improperly referencing the end_date based on the results. However, I cannot find where the comparison between begin_date and end_date should be.

Dataset:

create table visits_table (visit int,patientid int,begin_date date, end_date date);

INSERT INTO visits_table (visit, patientid, begin_date, end_date) VALUES (1,23,'1/12/2018','1/14/2018')
INSERT INTO visits_table (visit, patientid, begin_date, end_date) VALUES (2,23,'1/30/2018','2/14/2018')
INSERT INTO visits_table (visit, patientid, begin_date, end_date) VALUES (3,23,'4/20/2018','4/22/2018')
INSERT INTO visits_table (visit, patientid, begin_date, end_date) VALUES (4,23,'5/02/2018','5/03/2018')
like image 696
Martin Bobak Avatar asked May 13 '19 22:05

Martin Bobak


People also ask

How do you use recursive CTE?

Recursive CTE's Execution Order The first step is, execute the initial query (anchor member) which returns the base result set which is used for the next iteration. The second step is, execute the recursive query with the input result set from the previous iteration.

How do I improve my recursive CTE performance?

Recursive CTE queries do have a reliance on the unique parent/child keys in order to get the best performance. If this is not possible to achieve, then a WHILE loop is potentially a much more efficient approach to handling the recursive query.

When we use recursive CTE in SQL Server?

A recursive CTE is useful in querying hierarchical data such as organization charts where one employee reports to a manager or multi-level bill of materials when a product consists of many components, and each component itself also consists of many other components.

How many times recursion happen in CTE?

The reason behind this error is that by default, maximum number of recursion allowed for CTE is 100.


1 Answers

I adjusted your sample data to make visit 5 in the range of visit 3's end_date + 90 days. Visit 3 end date is 2018-04-22. If we add 90 days to it, it will be 2018-07-21. Your sample data in the question has visit 5 start date as 2018-07-23, which doesn't overlap with 2018-07-21. So, I adjusted this to be 2018-07-20 to make these dates overlap.

create table visits_table (visit int,patientid int,begin_date date, end_date date);

INSERT INTO visits_table (visit, patientid, begin_date, end_date) VALUES 
(1,23,'2018-01-12','2018-01-14'),
(2,23,'2018-01-30','2018-02-14'),
(3,23,'2018-04-20','2018-04-22'),
(4,23,'2018-05-02','2018-05-03'),
(5,23,'2018-07-20','2018-07-28');

You query was pretty close, you just had to calculate both start and end dates of the "previous" interval (first_begin_date, first_end_date).

If the "current" interval overlaps with the "previous", you carry over the "previous" interval into the current row.

Uncomment the lines in the query below to see all intermediate values.

with
vt
as
(
    select vt.*, row_number() over (partition by patientid order by begin_date) as seqnum
    from visits_table vt
)
,cte
as
(
    select
        vt.visit
        ,vt.patientid
        ,vt.begin_date as first_begin_date
        ,vt.end_date as first_end_date
        ,vt.begin_date
        ,vt.end_date
        ,seqnum
    from vt
    where seqnum = 1

    union all

    select
        vt.visit
        ,vt.patientid
        ,case when vt.begin_date <= dateadd(day, 90, cte.first_end_date)
            then cte.first_begin_date -- they overlap, keep the previous interval
            else vt.begin_date
        end as first_begin_date
        ,case when vt.begin_date <= dateadd(day, 90, cte.first_end_date)
            then cte.first_end_date -- they overlap, keep the previous interval
            else vt.end_date
        end as first_end_date
        ,vt.begin_date
        ,vt.end_date
        ,vt.seqnum
    from
        cte
        inner join vt
            on  vt.seqnum = cte.seqnum + 1
            and vt.patientid = cte.patientid
)
select
    cte.visit
    ,cte.patientid
    ,cte.begin_date
    ,cte.end_date
    ,case when first_begin_date = begin_date 
        then 0 else 1
    end as flag
--  ,DATEADD(day, 90, cte.end_date) AS enddd
--  ,*
from cte
order by cte.patientid, cte.begin_date;

Result

+-------+-----------+------------+------------+------+
| visit | patientid | begin_date |  end_date  | flag |
+-------+-----------+------------+------------+------+
|     1 |        23 | 2018-01-12 | 2018-01-14 |    0 |
|     2 |        23 | 2018-01-30 | 2018-02-14 |    1 |
|     3 |        23 | 2018-04-20 | 2018-04-22 |    0 |
|     4 |        23 | 2018-05-02 | 2018-05-03 |    1 |
|     5 |        23 | 2018-07-20 | 2018-07-28 |    1 |
+-------+-----------+------------+------------+------+
like image 152
Vladimir Baranov Avatar answered Oct 22 '22 06:10

Vladimir Baranov