Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

% of participants that are retained during six months

I am a school teacher very new to MS SQL server. Everyone is suggesting to try this site out. Here goes!

I am trying to write queries to test different types of outcome measures for the participation in a academic program. There are several different ways to calculate this outcome measurement I would like to try. The outcome which I am trying to Calculate is: What is the % of participants that are retained during six months of the program? I am testing different ways to define participant and different time ranges. There are 4 queries I am trying to produce. Unfortunately, I have to use for different tables: Attendance, Status, Deenrolled, Inactive. I have included sample data from each below

Queries

  1. A participant is defined as everyone that attended a class at least twice a week for 6 months (181 days total) starting at July 1st 2012 and Ending June 30th 2013, so the length of the fiscal year.If a participant is deenrolled or inactive they are dropped.
  2. A participant is defined as everyone that attended a class at least twice a week for 6 months (181 days total) starting at January 1st 2013. If a participant is deenrolled or becomes inactive they are dropped.
  3. A participant is defined as everyone that attended a class at least twice a week starting at January 1st 2013 until today
  4. A participant is defined as a student's enrollment start date until they are deenrolled or become inactive.

Participant (Numerator) participant / all students which were served (Denominator)

The 4 query outputs I am looking for are different versions of this:

Example

Participants    Served   Percent_Served
75               100        75%      

I have been messing around with different versions of the query below

 SELECT 
Count (distinct ID) as Count, 
  Count  ( DATEADD( dd, -181, DATEADD(wk, DATEDIFF(wk,0,Date), 0)) > 2 as Participants ,
FROM Attendance
where Attendence_date date between '07/01/2012' and '06/30/2013'
and ID not in (Select ID from Inactive) 
or ID not in (select ID from Deenrolled) 
GROUP BY ID

and

 SELECT 
Count (distinct ID) as Count, 
  Count  ( DATEADD( dd, -181, DATEADD(wk, DATEDIFF(wk,0,Date), 0)) - Enrolled_Date  as Participants ,
FROM Attendance
where Attendence_date date between '07/01/2012' and '06/30/2013'
and ID not in (Select ID from Inactive) 
or ID not in (select ID from Deenrolled) 
GROUP BY ID

Any programming assistance for these queries is greatly appreciated.

Below are the sample/example datasets.

Attendence_date is the date a student participated in one class.

CREATE TABLE Attendance (
    ID int,
    Attendence_date datetime
    )

INSERT INTO Attendance VALUES 
(4504498,  '7/1/2012'),
(4504498,  '7/2/2012'),
(4504498,   '7/3/2012'),
(4504498,   '7/4/2012'),
(4504498,   '7/5/2012'),
(4504498,   '7/8/2012'),
(4504498,   '7/9/2012'),
(4504498,   '7/10/2012'),
(4504498,   '7/11/2012'),
(4504498,   '7/12/2012'),
(4504498,   '7/1/2012'),
(4504498,   '7/2/2012'),
(4504498,   '7/3/2012'),
(4504498,   '7/4/2012'),
(4504498,   '7/5/2012'),
(4504498,   '7/8/2012'),
(4504498,   '7/9/2012'),
(4504498,   '7/10/2012'),
(4504498,   '7/11/2012'),
(4504498,   '7/12/2012'),
(9201052,   '7/15/2012'),
(9201052,   '7/16/2012'),
(9201052,   '7/17/2012'),
(9201052,   '7/17/2012'),
(9201052,   '7/18/2012'),   
(7949745,   '7/17/2012'),   
(7949745,   '7/18/2012'),
(7949745,   '7/23/2012'),   
(7949745,   '7/23/2012'),   
(7949745,   '7/24/2012'),
(7949745,   '7/26/2012'),
(7949745,   '7/26/2012'),   
(7949745,   '8/8/2012'),    
(7949745,   '8/8/2012'),    
(7949745,   '11/5/2012'),   
(7949745,   '11/5/2012'),   
(7949745,   '11/5/2012'),   
(7949745,   '11/6/2012'),   
(7949745,   '11/6/2012'),   
(7949745,   '11/6/2012'),   
(7949745,   '11/7/2012'),   
(7949745,   '11/7/2012'),   
(7949745,   '11/7/2012')

Here is the contains the enrollment date.

CREATE TABLE [Status] (
    ID int,
    Intake_Date datetime ,
   Engaged_Date datetime ,
   Enrolled_Date datetime)
INSERT INTO [Status] VALUES 
(7949745, '3/7/2012',   '7/17/2012', '3/8/2012'),
(4504498, '2/21/2013',  '3/5/2013',  '3/22/2013'),
(1486279, '4/18/2013',  '5/7/2013',   '5/20/2013'),
(9201052, '5/15/2012',  '7/13/2012',  '5/15/2012'),
(1722390, '3/5/2012',   '8/27/2012', '3/8/2012'),
(7735695, '9/7/2012',   '9/7/2012',  '9/28/2012'),
(9261549, '3/7/2012',   '7/24/2012', '3/8/2012'),
(3857008, '3/15/2013',  '3/18/2013', '4/3/2013'),
(8502583, '3/14/2013',     '4/15/2013', '5/3/2013'),
(1209774,  '4/19/2012',  '1/1/2012',   '4/24/2012') 

Here is the contains the de-enrollment date.

CREATE TABLE Deenrolled (
    ID int,
    Deenrolled_Date datetime)
INSERT INTO Deenrolled  VALUES 
(7949745,    '2/4/2013'),
(5485272,    '07/08/2013'),
(8955628,    '01/10/2013'),
(5123221,    '7/8/2013'),
(5774753,    '7/18/2013'),
(3005451,    '2/18/2013'),
(7518818,    '05/29/2013'),
(9656985,    '6/20/2013'),
(2438101,    '7/17/2013'),
(1437052,    '7/25/2013'),
(9133874,    '4/25/2013'),
(7007375,    '6/19/2013'),
(3178181,    '5/24/2013')

And inactive

CREATE TABLE Inactive (
    ID int,
   Effect_Date datetime)
INSERT INTO Inactive VALUES 
(1209774,       '10/12/2012'),
(5419494,       '10/12/2012'),
(4853049,       '10/9/2012'),
(1453678,       '5/23/2013'),
(1111554,       '7/16/2012'),
(5564128,       '2/15/2013'),
(1769234,       '7/16/2012')
like image 669
Matty34 Avatar asked Jul 29 '13 20:07

Matty34


2 Answers

Well I should say that is not an easy one. The main problem was to solve 'at least twice a week for a sixth months' part - it's easy to calculate twice a week, but it should be 6 continous months!

While I've tried to solve it, I've found absolutely brilliant answer by Niels van der Rest - Finding continuous ranges in a set of numbers. So I'll give you general query for the Part 1, you can change parameters and get result for Part 2:

declare @Weeks int, @PerWeek int, @StartDate date, @EndDate date, @count

select
    @StartDate = '20120701',
    @EndDate = '20130630',
    @Weeks = 26, -- 6 month or 26 weeks
    @PerWeek = 2 -- twice per week

select @count = count(distinct A.ID)
from Attendance as A
where
    A.Attendence_date between @StartDate and @EndDate and
    A.ID not in (select T.ID from Deenrolled as T) and
    A.ID not in (select T.ID from Inactive as T)

;with CTE as (
    -- Week numbers, filter by dates
    select
        A.ID,
        datediff(dd, @StartDate, A.Attendence_date) / 7 as Wk
    from Attendance as A
    where
        A.Attendence_date between @StartDate and @EndDate and
        A.ID not in (select T.ID from Deenrolled as T) and
        A.ID not in (select T.ID from Inactive as T)
  ), CTE2 as (
    -- Group by week, filter less then @PerWeek per week, calculate row number
    select
        Wk, ID,
        row_number() over (partition by ID order by Wk) as Row_Num
    from CTE
    group by Wk, ID
    having count(*) >= @PerWeek
)
-- Final query - group by difference between week and row_number
select 100 * cast(count(distinct ID) as float) / @count
from CTE2
group by ID, Wk - Row_Num
having count(*) >= @Weeks

I've created SQL FIDDLE EXAMPLE, you can test the query.

Part 3 is easy

declare @PerWeek int, @StartDate date

select
    @StartDate = '20130101',
    @PerWeek = 2 -- twice per week

select @count = count(distinct A.ID)
from Attendance as A
where
    A.Attendence_date >= @StartDate and
    A.ID not in (select T.ID from Deenrolled as T) and
    A.ID not in (select T.ID from Inactive as T)

;with CTE as (
    -- Week numbers, filter by dates
    select
        A.ID,
        datediff(dd, @StartDate, A.Attendence_date) / 7 as Wk
    from Attendance as A
    where
        A.Attendence_date >= @StartDate and
        A.ID not in (select T.ID from Deenrolled as T) and
        A.ID not in (select T.ID from Inactive as T)
  ), CTE2 as (
    -- Group by week, filter less then @PerWeek per week
    select distinct ID
    from CTE
    group by Wk, ID
    having count(*) >= @PerWeek
)
select 100 * cast(count(*) as float) / @count from CTE2

Part 4 seems a bit unclear for me, could you clarify?

like image 173
Roman Pekar Avatar answered Oct 11 '22 14:10

Roman Pekar


Give these a shot (changed because I missed a huge part of the question)

    SELECT  B.ID FROM
(SELECT number
      FROM master.dbo.spt_values
      WHERE TYPE = 'P' AND number < datediff(week, '07/01/2012', '06/30/2013') ) AS W
    JOIN
(SELECT A.ID, weeknum
  FROM
    (SELECT ID,  datediff(week, '07/01/2012',Attendence_date) AS weeknum
      FROM Attendance
      WHERE Attendence_date  BETWEEN '07/01/2012' AND '06/30/2013'
        AND ID NOT IN (SELECT ID FROM Deenrolled)
        AND ID NOT IN (SELECT ID FROM Inactive)) AS A
  GROUP BY A.ID, A.weeknum
  HAVING COUNT(A.ID) > 2) AS B ON W.number = B.weeknum
GROUP BY B.ID
HAVING COUNT(W.number) = datediff(week, '07/01/2012', '06/30/2013');

SELECT  B.ID FROM
(SELECT number
      FROM master.dbo.spt_values
      WHERE TYPE = 'P' AND number < datediff(week, '01/01/2013', '06/30/2013') ) AS W
    JOIN
(SELECT A.ID, weeknum
  FROM
    (SELECT ID,  datediff(week, '01/01/2013',Attendence_date) AS weeknum
      FROM Attendance
      WHERE Attendence_date  BETWEEN '01/01/2013' AND '06/30/2013'
        AND ID NOT IN (SELECT ID FROM Deenrolled)
        AND ID NOT IN (SELECT ID FROM Inactive)) AS A
  GROUP BY A.ID, A.weeknum
  HAVING COUNT(A.ID) > 2) AS B ON W.number = B.weeknum
GROUP BY B.ID
HAVING COUNT(W.number) = datediff(week, '07/01/2012', '06/30/2013');

SELECT  B.ID FROM
(SELECT number
      FROM master.dbo.spt_values
      WHERE TYPE = 'P' AND number < datediff(week, '01/01/2013', '06/30/2013') ) AS W
    JOIN
(SELECT A.ID, weeknum
  FROM
    (SELECT ID,  datediff(week, '01/01/2013',GetDate()) AS weeknum
      FROM Attendance
      WHERE Attendence_date  BETWEEN '01/01/2013' AND GetDate()
        AND ID NOT IN (SELECT ID FROM Deenrolled)
        AND ID NOT IN (SELECT ID FROM Inactive)) AS A
  GROUP BY A.ID, A.weeknum
  HAVING COUNT(A.ID) > 2) AS B ON W.number = B.weeknum
GROUP BY B.ID
HAVING COUNT(W.number) = datediff(week, '07/01/2012', GetDate());

SELECT DISTINCT(Attendance.ID)
  FROM Attendance
  WHERE Attendance.ID NOT IN (SELECT ID FROM Deenrolled)
      AND ID NOT IN (SELECT ID FROM Inactive);

and an sqlfiddle to help you out: http://sqlfiddle.com/#!6/97230/3 Good luck!

like image 36
Barbara Laird Avatar answered Oct 11 '22 13:10

Barbara Laird