Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to identify the first gap in multiple start and end date ranges for each distinct member in T-SQL

I have been working on the below but getting no results and the deadline is fast approaching. Also, there are over a million rows as the below. Appreciate your help on the below.

Objective: Group results by MEMBER and build Continuous Coverage Ranges for each Member by combining individual Date Ranges which either overlap or run consecutive to each other with no breaks between the Start & End day of the range.

I have data in the below format:

MemberCode  -----   ClaimID   -----       StartDate   -----       EndDate
00001   -----       012345   -----       2010-01-15   -----       2010-01-20
00001   -----       012350   -----       2010-01-19   -----       2010-01-22
00001   -----       012352   -----       2010-01-20   -----       2010-01-25
00001   -----       012355   -----       2010-01-26   -----       2010-01-30
00002   -----       012357   -----       2010-01-20   -----       2010-01-25
00002   -----       012359   -----       2010-01-30   -----       2010-02-05
00002   -----       012360   -----       2010-02-04   -----       2010-02-15
00003   -----       012365   -----       2010-02-15   -----       2010-02-30

...

In the above the member (00001) is a valid member as there is a continuous date range from 2010-01-15 to 2010-01-30 (with no gaps). Please note that the Claim ID 012355 for this member starts immediately next to the End Date of Claim ID 012352. This is still valid as it forms a continuous range.

However, the member (00002) should be an Invalid member as there is a gap of 5 days between Enddate of Claim ID 012357 and Start Day for Claim ID 012359

What I am trying to do is get a list of ONLY those members who have claims for every single day of the continuous date range (for each member) with no gaps between the MIN(Start-date) and Max(End Date) for each Distinct member. Members who have gaps are discarded.

Thanks in advance.

UPDATE:

I have reached until here. Note: FILLED_DT = Start Date & PresCoverEndDT = End Date

SELECT PresCoverEndDT, FILLED_DT 

FROM 

(

    SELECT DISTINCT FILLED_DT, ROW_NUMBER() OVER (ORDER BY FILLED_DT) RN

    FROM Temp_Claims_PRIOR_STEP_5 T1

    WHERE NOT EXISTS 

            (SELECT * FROM Temp_Claims_PRIOR_STEP_5 T2

            WHERE T1.FILLED_DT > T2.FILLED_DT AND T1.FILLED_DT< T2.PresCoverEndDT 

            AND T1.MBR_KEY = T2.MBR_KEY )

) T1

    JOIN (SELECT DISTINCT PresCoverEndDT, ROW_NUMBER() OVER (ORDER BY PresCoverEndDT) RN

        FROM Temp_Claims_PRIOR_STEP_5 T1

        WHERE NOT EXISTS 

            (SELECT * FROM Temp_Claims_PRIOR_STEP_5 T2

             WHERE T1.PresCoverEndDT > T2.FILLED_DT AND T1.PresCoverEndDT < T2.PresCoverEndDT AND T1.MBR_KEY = T2.MBR_KEY )
) T2

     ON T1.RN - 1 = T2.RN

WHERE   PresCoverEndDT < FILLED_DT 

The above code seems to have error as I am getting only one row and that too it is incorrect. My desired output is only 1 column as below:

Valid_Member_Code

00001

00007

00009

... etc.,

like image 673
Vijay Avatar asked Aug 29 '12 12:08

Vijay


People also ask

How do you find overlapping time intervals in SQL?

Basically, a period can be represented by a line fragment on time axis which has two boundaries; starttime and endtime. To claim two time periods to be overlapping, they must have common datetime values which is between lower and upper limits of both periods.

How can I get missing date between two dates in SQL?

You can make a use of While loop if you would like to store missing dates in some table. In this way also it will work. Added not exists clause and not it will not give existing dates. @Lima This query will check if the date exists in table or not, and then insert missing dates.

How do I search between two dates and get all records?

You can use the dateadd function of SQL. This will return ID 1,2,3,4. We are doing a double Dateadd ; the first is to add a day to the current endDate , it will be 2012-03-28 00:00:00, then you subtract one second to make the end date 2012-03- 27 23:59:59.


2 Answers

Try this: http://www.sqlfiddle.com/#!3/c3365/20

with s as
(
  select *, row_number() over(partition by membercode order by startdate) rn
  from tbl
)
,gaps as
(
select a.membercode, a.startdate, a.enddate, b.startdate as nextstartdate
  ,datediff(d, a.enddate, b.startdate) as gap
from s a
join s b on b.membercode = a.membercode and b.rn = a.rn + 1
)
select membercode 
from gaps
group by membercode
having sum(case when gap <= 1 then 1 end) = count(*);

See query progression here: http://www.sqlfiddle.com/#!3/c3365/20


How it works, compare the current end date to its next start date and check the date gap:

with s as
(
  select *, row_number() over(partition by membercode order by startdate) rn
  from tbl
)
select a.membercode, a.startdate, a.enddate, b.startdate as nextstartdate
  ,datediff(d, a.enddate, b.startdate) as gap
from s a
join s b on b.membercode = a.membercode and b.rn = a.rn + 1;

Output:

| MEMBERCODE |  STARTDATE |    ENDDATE | NEXTSTARTDATE | GAP |
--------------------------------------------------------------
|          1 | 2010-01-15 | 2010-01-20 |    2010-01-19 |  -1 |
|          1 | 2010-01-19 | 2010-01-22 |    2010-01-20 |  -2 |
|          1 | 2010-01-20 | 2010-01-25 |    2010-01-26 |   1 |
|          2 | 2010-01-20 | 2010-01-25 |    2010-01-30 |   5 |
|          2 | 2010-01-30 | 2010-02-05 |    2010-02-04 |  -1 |

Then check if a member has same count of claims with no gaps to its total claims:

with s as
(
  select *, row_number() over(partition by membercode order by startdate) rn
  from tbl
)
,gaps as
(
select a.membercode, a.startdate, a.enddate, b.startdate as nextstartdate
  ,datediff(d, a.enddate, b.startdate) as gap
from s a
join s b on b.membercode = a.membercode and b.rn = a.rn + 1
)
select membercode, count(*) as count, sum(case when gap <= 1 then 1 end) as gapless_count
from gaps
group by membercode;

Output:

| MEMBERCODE | COUNT | GAPLESS_COUNT |
--------------------------------------
|          1 |     3 |             3 |
|          2 |     2 |             1 |

Finally, filter them, members with no gaps in their claims:

with s as
(
  select *, row_number() over(partition by membercode order by startdate) rn
  from tbl
)
,gaps as
(
select a.membercode, a.startdate, a.enddate, b.startdate as nextstartdate
  ,datediff(d, a.enddate, b.startdate) as gap
from s a
join s b on b.membercode = a.membercode and b.rn = a.rn + 1
)
select membercode 
from gaps
group by membercode
having sum(case when gap <= 1 then 1 end) = count(*);

Output:

| MEMBERCODE |
--------------
|          1 |

Do note that you don't need to do COUNT(*) > 1 to detect members with 2 or more claims. Instead of using LEFT JOIN, we uses JOIN, this will automatically discard members who have yet to have a second claim. Here's the version(longer) if you opt to use LEFT JOIN instead(same output as above):

with s as
(
select *, row_number() over(partition by membercode order by startdate) rn
from tbl
)
,gaps as
(
select a.membercode, a.startdate, a.enddate, b.startdate as nextstartdate
,datediff(d, a.enddate, b.startdate) as gap
from s a
left join s b on b.membercode = a.membercode and b.rn = a.rn + 1
)
select membercode 
from gaps
group by membercode
having sum(case when gap <= 1 then 1 end) = count(gap)
and count(*) > 1; -- members who have two ore more claims only

Here's how see data of above query prior to filtering:

with s as
(
  select *, row_number() over(partition by membercode order by startdate) rn
  from tbl
)
,gaps as
(
select a.membercode, a.startdate, a.enddate, b.startdate as nextstartdate
  ,datediff(d, a.enddate, b.startdate) as gap
from s a
left join s b on b.membercode = a.membercode and b.rn = a.rn + 1
)
select * from gaps;

Output:

| MEMBERCODE |  STARTDATE |    ENDDATE | NEXTSTARTDATE |    GAP |
-----------------------------------------------------------------
|          1 | 2010-01-15 | 2010-01-20 |    2010-01-19 |     -1 |
|          1 | 2010-01-19 | 2010-01-22 |    2010-01-20 |     -2 |
|          1 | 2010-01-20 | 2010-01-25 |    2010-01-26 |      1 |
|          1 | 2010-01-26 | 2010-01-30 |        (null) | (null) |
|          2 | 2010-01-20 | 2010-01-25 |    2010-01-30 |      5 |
|          2 | 2010-01-30 | 2010-02-05 |    2010-02-04 |     -1 |
|          2 | 2010-02-04 | 2010-02-15 |        (null) | (null) |
|          3 | 2010-02-15 | 2010-03-02 |        (null) | (null) |

EDIT on requirement clarification:

On your clarification, you wanted to include members who have yet to have second claim too, do this instead: http://sqlfiddle.com/#!3/c3365/22

with s as
(
select *, row_number() over(partition by membercode order by startdate) rn
from tbl
)
,gaps as
(
select a.membercode, a.startdate, a.enddate, b.startdate as nextstartdate
,datediff(d, a.enddate, b.startdate) as gap
from s a
left join s b on b.membercode = a.membercode and b.rn = a.rn + 1
)
select membercode 
from gaps
group by membercode
having sum(case when gap <= 1 then 1 end) = count(gap)
-- members who have yet to have a second claim are valid too
or count(nextstartdate) = 0; 

Output:

| MEMBERCODE |
--------------
|          1 |
|          3 |

The technique is to count the member's nextstartdate, if they have no next start date date(i.e. count(nextstartdate) = 0) then they are single claims only and valid too, then just attach this OR condition:

or count(nextstartdate) = 0; 

Actually, the condition below will suffice too, I wanted to make the query more self-documenting though, hence I recommend counting on member's nextstartdate. Here's an alternative condition for counting members who have yet to have a second claim:

or count(*) = 1;

Btw, we also have to change the comparison from this:

sum(case when gap <= 1 then 1 end) = count(*)

to this(as we are using LEFT JOIN now):

sum(case when gap <= 1 then 1 end) = count(gap)
like image 148
Michael Buen Avatar answered Nov 15 '22 09:11

Michael Buen


Try this, it partitions rows by MemberCode and gives them ordinal numbers. Then it compares rows with subsequent num value, if difference between end date of a row and start date of a next row is greater than one day, it's an invalid member:

DECLARE @t TABLE (MemberCode  VARCHAR(100), ClaimID   
    INT,StartDate   DATETIME,EndDate DATETIME)
INSERT @t
VALUES
('00001'   ,       012345   ,        '2010-01-15'   ,       '2010-01-20')
,('00001'   ,       012350   ,       '2010-01-19'   ,       '2010-01-22')
,('00001'   ,       012352   ,       '2010-01-20'   ,       '2010-01-25')
,('00001'   ,       012355   ,       '2010-01-26'   ,       '2010-01-30')
,('00002'   ,       012357   ,       '2010-01-20'   ,       '2010-01-25')
,('00002'   ,       012359   ,       '2010-01-30'   ,       '2010-02-05')
,('00002'   ,       012360   ,       '2010-02-04'   ,       '2010-02-15')
,('00003'   ,       012365   ,       '2010-02-15'   ,       '2010-02-28')
,('00004'   ,       012366   ,       '2010-03-18'   ,       '2010-03-23')
,('00005'   ,       012367   ,       '2010-03-19'   ,       '2010-03-25')
,('00006'   ,       012368   ,       '2010-03-20'   ,       '2010-03-21')

;WITH tbl AS (

    SELECT  *,
            ROW_NUMBER() OVER (PARTITION BY MemberCode ORDER BY StartDate) 
                AS num
    FROM    @t
), invalid AS (

    SELECT  tbl.MemberCode
    FROM    tbl
    JOIN    tbl _tbl ON 
            tbl.num = _tbl.num - 1
    AND     tbl.MemberCode = _tbl.MemberCode
    WHERE   DATEDIFF(DAY, tbl.EndDate, _tbl.StartDate) > 1  
)

SELECT  MemberCode
FROM    tbl
EXCEPT
SELECT  MemberCode
FROM    invalid
like image 22
Ivan Golović Avatar answered Nov 15 '22 09:11

Ivan Golović