Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL Server - cumulative sum on overlapping data - getting date that sum reaches a given value

In our company, our clients perform various activities that we log in different tables - Interview attendance, Course Attendance, and other general activities. I have a database view that unions data from all of these tables giving us the ActivityView that looks like this. As you can see some activities overlap - for example while attending an interview, a client may have been performing a CV update activity.

+----------------------+---------------+---------------------+-------------------+
| activity_client_id   | activity_type | activity_start_date | activity_end_date |
+----------------------+---------------+---------------------+-------------------+
|                  112 | Interview     | 2015-06-01 09:00    | 2015-06-01 11:00  |
|                  112 | CV updating   | 2015-06-01 09:30    | 2015-06-01 11:30  |
|                  112 | Course        | 2015-06-02 09:00    | 2015-06-02 16:00  |
|                  112 | Interview     | 2015-06-03 09:00    | 2015-06-03 10:00  |
+----------------------+---------------+---------------------+-------------------+

Each client has a "Sign Up Date", recorded on the client table, which is when they joined our programme. Here it is for our sample client:

+-----------+---------------------+
| client_id | client_sign_up_date |
+-----------+---------------------+
|       112 | 2015-05-20          |
+-----------+---------------------+

I need to create a report that will show the following columns:

+-----------+---------------------+--------------------------------------------+
| client_id | client_sign_up_date | date_client_completed_5_hours_of_activity |
+-----------+---------------------+--------------------------------------------+

We need this report in order to see how effective our programme is. An important aim of the programme is that we get every client to complete at least 5 hours of activity as quickly as possible. So this report will tell us how long from sign up does it take each client to achieve this figure.

What makes this even trickier is that when we calculate 5 hours of total activity, we must discount overlapping activities:

In the sample data above the client attended an interview between 09:00 and 11:00.
On the same day they also performed CV updating activity from 09:30 to 11:30. For our calculation, this would give them total activity for the day of 2.5 hours (150 minutes) - we would only count 30 minutes of the CV updating as the Interview overlaps it up to 11:00.

So the report for our sample client would give the following result:

+-----------+---------------------+--------------------------------------------+
| client_id | client_sign_up_date | date_client_completed_5_hours_of_activity |
+-----------+---------------------+--------------------------------------------+
|       112 | 2015-05-20          | 2015-06-02                                 |
+-----------+---------------------+--------------------------------------------+

So my question is how can I create the report using a select statement ? I can work out how to do this by writing a stored procedure that will loop through the view and write the result to a report table. But I would much prefer to avoid a stored procedure and have a select statement that will give me the report on the fly.

I am using SQL Server 2005.

like image 610
Brian Avatar asked Oct 19 '22 10:10

Brian


1 Answers

See SQL Fiddle here.

with tbl as (
  -- this will generate daily merged ovelaping time
  select distinct
    a.id
    ,(
        select min(x.starttime) 
        from act x 
        where x.id=a.id and ( x.starttime between a.starttime and a.endtime
          or a.starttime between x.starttime and x.endtime )
    ) start1
    ,(
        select max(x.endtime) 
        from act x 
        where x.id=a.id and ( x.endtime between a.starttime and a.endtime
          or a.endtime between x.starttime and x.endtime )
    ) end1
  from act a

), tbl2 as 
(
  -- this will add minute and total minute column
  select 
    * 
    ,datediff(mi,t.start1,t.end1) mi
    ,(select sum(datediff(mi,x.start1,x.end1)) from tbl x where x.id=t.id and x.end1<=t.end1) totalmi
  from tbl t
), tbl3 as 
(
  -- now final query showing starttime and endtime for 5 hours other wise null in case not completed 5(300 minutes) hours
  select 
    t.id
    ,min(t.start1) starttime
    ,min(case when t.totalmi>300 then t.end1 else null end) endtime
  from tbl2 t
  group by t.id
)
-- final result 
select *
from tbl3
where endtime is not null
like image 108
Mitan Shah Avatar answered Oct 27 '22 09:10

Mitan Shah