Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL: best way to build a timeline from two history tables

Tags:

sql

tsql

Considering the follwing:

CREATE TABLE Members (MemberID INT)
INSERT Members VALUES (1001)

CREATE TABLE PCPs (PCPID INT)
INSERT PCPs VALUES (231)
INSERT PCPs VALUES (327)
INSERT PCPs VALUES (390)

CREATE TABLE Plans (PlanID INT)
INSERT Plans VALUES (555)
INSERT Plans VALUES (762)

CREATE TABLE MemberPCP (
    MemberID INT
    , PCP INT
    , StartDate DATETIME
    , EndDate DATETIME)
INSERT MemberPCP VALUES (1001, 231, '2002-01-01', '2002-06-30')
INSERT MemberPCP VALUES (1001, 327, '2002-07-01', '2003-05-31')
INSERT MemberPCP VALUES (1001, 390, '2003-06-01', '2003-12-31')

CREATE TABLE MemberPlans (
    MemberID INT
    , PlanID INT
    , StartDate DATETIME
    , EndDate DATETIME)
INSERT MemberPlans VALUES (1001, 555, '2002-01-01', '2003-03-31')
INSERT MemberPlans VALUES (1001, 762, '2003-04-01', '2003-12-31')

I'm looking for a clean way to construct a timeline for Member/PCP/Plan relationships, where a change in either the PCP or plan for a member would result in a separate start/end row in the result. For example, if over a few years, a member changed their PCP twice and their plan once, but each on different dates, I would see something like the following:

MemberID  PCP  PlanID  StartDate    EndDate
1001      231  555     2002-01-01   2002-06-30
1001      327  555     2002-07-01   2003-03-31
1001      327  762     2003-04-01   2003-05-31
1001      390  762     2003-06-01   2003-12-31

As you can see, I need a separate result row for each date period that involves a difference in the Member/PCP/Plan association. I have a solution in place, but it is very convoluted with a lot of CASE statements and conditional logic in the WHERE clause. I'm just thinking there is a much simpler way to do this.

Thanks.

like image 709
Rich.Carpenter Avatar asked Jun 14 '12 19:06

Rich.Carpenter


2 Answers

Compatible with T-SQL. I agree with Glenn on the general approach.

Another suggestion: If you allow hops between periods in your business, this code will need further tweak. Otherwise, I think deferring the EndDate value from next record's StartDate will be better for having more controlled behavior from your code. In that case, you want to ensure the rule before the data get to this query.

Edit: just learned about With statement and SQL Fiddle from Andriy M's post. You can see my answer at SQL Fiddle too.

Edit: Fixed the bug pointed out by Andriy.

WITH StartDates AS (
SELECT MemberId, StartDate FROM MemberPCP UNION
SELECT MemberId, StartDate FROM MemberPlans UNION
SELECT MemberId, EndDate + 1 FROM MemberPCP UNION
SELECT MemberId, EndDate + 1 FROM MemberPlans
),
EndDates AS (
SELECT MemberId, EndDate = StartDate - 1 FROM MemberPCP UNION
SELECT MemberId, StartDate - 1 FROM MemberPlans UNION
SELECT MemberId, EndDate FROM MemberPCP UNION
SELECT MemberId, EndDate FROM MemberPlans
),
Periods AS (
SELECT s.MemberId, s.StartDate, EndDate = min(e.EndDate)
  FROM StartDates s
       INNER JOIN EndDates e
           ON s.StartDate <= e.EndDate
          AND s.MemberId = e.MemberId
 GROUP BY s.MemberId, s.StartDate
)
SELECT MemberId = p.MemberId,
       pcp.PCP, pl.PlanId,
       p.StartDate, p.EndDate
  FROM Periods p
       LEFT JOIN MemberPCP pcp
           -- because of the way we divided period,
           -- there will be one and only one record that fits this join clause
           ON p.StartDate >= pcp.StartDate
          AND p.EndDate <= pcp.EndDate
          AND p.MemberId = pcp.MemberId
       LEFT JOIN MemberPlans pl
           ON p.StartDate >= pl.StartDate
          AND p.EndDate <= pl.EndDate
          AND p.MemberId = pl.MemberId
 ORDER BY p.MemberId, p.StartDate
like image 110
kennethc Avatar answered Oct 15 '22 19:10

kennethc


My approach is to take the unique combination of start dates for each member as the starting point and then build out the other pieces of the query from there:

--
-- Traverse down a list of 
-- unique Member ID and StartDates
-- 
-- For each row find the most 
-- recent PCP for that member 
-- which started on or before
-- the start date of the current
-- row in the traversal
--
-- For each row find the most 
-- recent PlanID for that member
-- which started on or before
-- the start date of the current
-- row in the traversal
-- 
-- For each row find the earliest
-- end date for that member
-- (from a collection of unique
-- member end dates) that happened
-- after the start date of the
-- current row in the traversal
-- 
SELECT MemberID,
  (SELECT TOP 1 PCP 
   FROM MemberPCP 
   WHERE MemberID = s.MemberID 
   AND StartDate <= s.StartDate 
   ORDER BY StartDate DESC
  ) AS PCP,
  (SELECT TOP 1 PlanID 
   FROM MemberPlans 
   WHERE MemberID = s.MemberID 
   AND StartDate <= s.StartDate 
   ORDER BY StartDate DESC
  ) AS PlanID,
  StartDate,  
  (SELECT TOP 1 EndDate 
   FROM (
    SELECT MemberID, EndDate 
    FROM MemberPlans 
    UNION 
    SELECT MemberID, EndDate 
    FROM MemberPCP) e
   WHERE EndDate >= s.StartDate 
   ORDER BY EndDate
  ) AS EndDate
FROM ( 
  SELECT
    MemberID,
    StartDate
  FROM MemberPlans
  UNION 
  SELECT
    MemberID,
    Startdate
  FROM MemberPCP
) s
ORDER BY StartDate
like image 23
8kb Avatar answered Oct 15 '22 20:10

8kb