I have a table that contains hospital visits for patients. I am trying to flag visits in which a visits' begin_date
overlaps the previous visits' end_date
+ 90 days. However, the caveat to this is that once a visit is flagged as an overlap visit, that visit should not be used to assess an overlap with another visit. Let me explain with an example.
Table
visitID patientid begin_date end_date
1 23 1/12/2018 1/14/2018
2 23 1/30/2018 2/14/2018
3 23 4/20/2018 4/22/2018
4 23 5/02/2018 5/03/2018
5 23 7/23/2018 7/28/2018
In the example above, the patient had 5 visits. Visit 2's begin_date
was in range of visit 1's end_date
+ 90 days, so visit 2 should be flagged. Once visit 2 is flagged, that row should not be used in the analysis for any future visits. Conceptually, it would be like removing visit 2 and beginning the analysis again.
interim stage (visit 2 is removed, and analysis begins again)
visitID patientid begin_date end_date
1 23 1/12/2018 1/14/2018
3 23 4/20/2018 4/22/2018
4 23 5/02/2018 5/03/2018
5 23 7/23/2018 7/28/2018
So even though visit 3 overlaps with visit 2, since visit 2 has been removed, visit 3 will not be flagged as the previous visit (now visit 1) is more than end_date
+ 90 days away from visit 3's begin_date
. Then, visit 4 should be flagged as it overlaps with a visit that was not flagged (visit 3). So since visit 4 is flagged, then visit 5 will be removed as it's begin_date
is in the range of visit 3's end_date
+ 90 days.
Anticipated output
visitID patientid begin_date end_date flag
1 23 1/12/2018 1/14/2018 0
2 23 1/30/2018 2/14/2018 1
3 23 4/20/2018 4/22/2018 0
4 23 5/02/2018 5/03/2018 1
5 23 7/23/2018 7/28/2018 1
@gordonlinoff answered a very similar question here, but I am running into issues using recursive CTEs. The difference between the questions is that this question needs to reference another column (end_date
), rather than a single date column. Recursive CTEs are still a new concept to me, but I hope this will help solidify the concept.
My attempt to solve this puzzle (piggy backing off of @gordonlinoff):
with vt as (
select vt.*, row_number() over (partition by patientid order by begin_date) as seqnum
from visits_table vt
),
cte as (
select vt.visit, vt.patientid, vt.begin_date, vt.end_date, vt.begin_date as first_begin_date, seqnum
from vt
where seqnum = 1
union all
select vt.visit, vt.patientid, vt.begin_date, vt.end_date,
(case when vt.begin_date > dateadd(day, 90, cte.end_date) then vt.begin_date else cte.end_date end),
vt.seqnum
from cte join
vt
on vt.seqnum = cte.seqnum + 1 and vt.patientid = cte.patientid
)
select cte.visit, cte.patientid, cte.begin_date, cte.end_date,
(case when first_begin_date = begin_date then 0 else 1 end) as flag
from cte
order by cte.patientid, cte.begin_date;
My edits are improperly referencing the end_date based on the results. However, I cannot find where the comparison between begin_date
and end_date
should be.
Dataset:
create table visits_table (visit int,patientid int,begin_date date, end_date date);
INSERT INTO visits_table (visit, patientid, begin_date, end_date) VALUES (1,23,'1/12/2018','1/14/2018')
INSERT INTO visits_table (visit, patientid, begin_date, end_date) VALUES (2,23,'1/30/2018','2/14/2018')
INSERT INTO visits_table (visit, patientid, begin_date, end_date) VALUES (3,23,'4/20/2018','4/22/2018')
INSERT INTO visits_table (visit, patientid, begin_date, end_date) VALUES (4,23,'5/02/2018','5/03/2018')
Recursive CTE's Execution Order The first step is, execute the initial query (anchor member) which returns the base result set which is used for the next iteration. The second step is, execute the recursive query with the input result set from the previous iteration.
Recursive CTE queries do have a reliance on the unique parent/child keys in order to get the best performance. If this is not possible to achieve, then a WHILE loop is potentially a much more efficient approach to handling the recursive query.
A recursive CTE is useful in querying hierarchical data such as organization charts where one employee reports to a manager or multi-level bill of materials when a product consists of many components, and each component itself also consists of many other components.
The reason behind this error is that by default, maximum number of recursion allowed for CTE is 100.
I adjusted your sample data to make visit 5 in the range of visit 3's end_date + 90 days. Visit 3 end date is 2018-04-22
. If we add 90 days to it, it will be 2018-07-21
. Your sample data in the question has visit 5 start date as 2018-07-23
, which doesn't overlap with 2018-07-21
. So, I adjusted this to be 2018-07-20
to make these dates overlap.
create table visits_table (visit int,patientid int,begin_date date, end_date date);
INSERT INTO visits_table (visit, patientid, begin_date, end_date) VALUES
(1,23,'2018-01-12','2018-01-14'),
(2,23,'2018-01-30','2018-02-14'),
(3,23,'2018-04-20','2018-04-22'),
(4,23,'2018-05-02','2018-05-03'),
(5,23,'2018-07-20','2018-07-28');
You query was pretty close, you just had to calculate both start and end dates of the "previous" interval (first_begin_date, first_end_date)
.
If the "current" interval overlaps with the "previous", you carry over the "previous" interval into the current row.
Uncomment the lines in the query below to see all intermediate values.
with
vt
as
(
select vt.*, row_number() over (partition by patientid order by begin_date) as seqnum
from visits_table vt
)
,cte
as
(
select
vt.visit
,vt.patientid
,vt.begin_date as first_begin_date
,vt.end_date as first_end_date
,vt.begin_date
,vt.end_date
,seqnum
from vt
where seqnum = 1
union all
select
vt.visit
,vt.patientid
,case when vt.begin_date <= dateadd(day, 90, cte.first_end_date)
then cte.first_begin_date -- they overlap, keep the previous interval
else vt.begin_date
end as first_begin_date
,case when vt.begin_date <= dateadd(day, 90, cte.first_end_date)
then cte.first_end_date -- they overlap, keep the previous interval
else vt.end_date
end as first_end_date
,vt.begin_date
,vt.end_date
,vt.seqnum
from
cte
inner join vt
on vt.seqnum = cte.seqnum + 1
and vt.patientid = cte.patientid
)
select
cte.visit
,cte.patientid
,cte.begin_date
,cte.end_date
,case when first_begin_date = begin_date
then 0 else 1
end as flag
-- ,DATEADD(day, 90, cte.end_date) AS enddd
-- ,*
from cte
order by cte.patientid, cte.begin_date;
Result
+-------+-----------+------------+------------+------+
| visit | patientid | begin_date | end_date | flag |
+-------+-----------+------------+------------+------+
| 1 | 23 | 2018-01-12 | 2018-01-14 | 0 |
| 2 | 23 | 2018-01-30 | 2018-02-14 | 1 |
| 3 | 23 | 2018-04-20 | 2018-04-22 | 0 |
| 4 | 23 | 2018-05-02 | 2018-05-03 | 1 |
| 5 | 23 | 2018-07-20 | 2018-07-28 | 1 |
+-------+-----------+------------+------------+------+
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With