Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

identifying a period with particular characteristic using sql

I am looking to write an SQL query that identifies the longest period that an individual has gone without having a meat meal. Ideally the output would look like

person  periodstart  periodend 

where for each person the longest period without meat would be identified and

periodstart would be the time of the first non-meat meal

periodend would be the time of the first meat meal following.

SQL below creates table and data .

CREATE TABLE MEALS 
(
  PERSON VARCHAR2(20 BYTE) 
, MEALTIME DATE 
, FOODTYPE VARCHAR2(20) 
);

Insert into MEALS (PERSON,MEALTIME,FOODTYPE) 
values ('Jane',to_date('04-JAN-15 06:09:09','DD-MON-RR HH24:MI:SS'),'fruit');
Insert into MEALS (PERSON,MEALTIME,FOODTYPE) 
values ('Jane',to_date('05-JAN-15 06:09:09','DD-MON-RR HH24:MI:SS'),'veg');
Insert into MEALS (PERSON,MEALTIME,FOODTYPE) 
values ('Jane',to_date('07-JAN-15 06:01:24','DD-MON-RR HH24:MI:SS'),'meat');
Insert into MEALS (PERSON,MEALTIME,FOODTYPE) 
values ('Jane',to_date('07-JAN-15 12:03:50','DD-MON-RR HH24:MI:SS'),'veg');
Insert into MEALS (PERSON,MEALTIME,FOODTYPE) 
values ('John',to_date('02-JAN-15 10:03:23','DD-MON-RR HH24:MI:SS'),'veg');
Insert into MEALS (PERSON,MEALTIME,FOODTYPE) 
values ('John',to_date('03-JAN-15 10:03:23','DD-MON-RR HH24:MI:SS'),'meat');
Insert into MEALS (PERSON,MEALTIME,FOODTYPE) 
values ('John',to_date('04-JAN-15 10:03:23','DD-MON-RR HH24:MI:SS'),'veg');
Insert into MEALS (PERSON,MEALTIME,FOODTYPE) 
values ('John',to_date('05-JAN-15 07:03:23','DD-MON-RR HH24:MI:SS'),'veg');
Insert into MEALS (PERSON,MEALTIME,FOODTYPE) 
values ('John',to_date('05-JAN-15 10:03:23','DD-MON-RR HH24:MI:SS'),'veg');
Insert into MEALS (PERSON,MEALTIME,FOODTYPE) 
values ('John',to_date('06-JAN-15 05:01:54','DD-MON-RR HH24:MI:SS'),'veg');
Insert into MEALS (PERSON,MEALTIME,FOODTYPE) 
values ('John',to_date('06-JAN-15 05:01:54','DD-MON-RR HH24:MI:SS'),'fruit');
Insert into MEALS (PERSON,MEALTIME,FOODTYPE) 
values ('John',to_date('06-JAN-15 10:03:23','DD-MON-RR HH24:MI:SS'),'meat');
Insert into MEALS (PERSON,MEALTIME,FOODTYPE) 
values ('Mary',to_date('02-JAN-15 05:01:54','DD-MON-RR HH24:MI:SS'),'veg');
Insert into MEALS (PERSON,MEALTIME,FOODTYPE) 
values ('Mary',to_date('03-JAN-15 06:04:25','DD-MON-RR HH24:MI:SS'),'meat');
Insert into MEALS (PERSON,MEALTIME,FOODTYPE) 
values ('Mary',to_date('05-JAN-15 04:04:25','DD-MON-RR HH24:MI:SS'),'veg');
Insert into MEALS (PERSON,MEALTIME,FOODTYPE) 
values ('Mary',to_date('05-JAN-15 06:04:25','DD-MON-RR HH24:MI:SS'),'meat');
Insert into MEALS (PERSON,MEALTIME,FOODTYPE) 
values ('Mary',to_date('05-JAN-15 06:04:25','DD-MON-RR HH24:MI:SS'),'meat');
Insert into MEALS (PERSON,MEALTIME,FOODTYPE) 
values ('Mary',to_date('06-JAN-15 05:01:54','DD-MON-RR HH24:MI:SS'),'veg');
Insert into MEALS (PERSON,MEALTIME,FOODTYPE) 
values ('Mary',to_date('07-JAN-15 06:04:25','DD-MON-RR HH24:MI:SS'),'veg');

commit;
like image 301
Tony Wolff Avatar asked Jan 08 '15 06:01

Tony Wolff


2 Answers

This is a gaps-and-islands problem, and there are various ways to approach it. One is to use an analytic function effect/trick to finds the chains of contiguous periods for each type:

select person, mealtime, foodtype,
  case when foodtype = 'meat' then 'Yes' else 'No' end as meat,
  dense_rank() over (partition by person,
      case when foodtype = 'meat' then 1 else 0 end order by mealtime)
    - dense_rank() over (partition by person order by mealtime) as chain
from meals
order by person, mealtime;

The 'chain' pseudocolumn is based on a case here, as you want fruit and veg - or anything non-meat - treated the same.

You can then use that as an inner query to find the start of each meat and non-meat period, from the first meal in each chain:

select person, meat, min(mealtime) as first_meal
from (
  select person, mealtime, foodtype,
    case when foodtype = 'meat' then 'Yes' else 'No' end as meat,
    dense_rank() over (partition by person,
        case when foodtype = 'meat' then 1 else 0 end order by mealtime)
      - dense_rank() over (partition by person order by mealtime) as chain
  from meals
)
group by person, meat, chain
order by person, min(mealtime);

PERSON               MEAT FIRST_MEAL       
-------------------- ---- ------------------
Jane                 No   04-JAN-15 06:09:09 
Jane                 Yes  07-JAN-15 06:01:24 
Jane                 No   07-JAN-15 12:03:50 
John                 No   02-JAN-15 10:03:23 
...

You want the period to cover the first non-meat meal to the next meat-meal, so you can use that as an inner query with lead and lag to peek at the rows either side: for a veg period you peek ahead to see the start of the next meat period; for a meat period you peek back to see he start of the previous veg period:

select person, meat,
  case when meat = 'Yes' then lag(first_meal) over (partition by person
      order by first_meal) else first_meal end as period_start,
  case when meat = 'No' then lead(first_meal) over (partition by person
      order by first_meal) else first_meal end as period_end
from (
  select person, meat, min(mealtime) as first_meal
  from (
    select person, mealtime, foodtype,
      case when foodtype = 'meat' then 'Yes' else 'No' end as meat,
      dense_rank() over (partition by person,
          case when foodtype = 'meat' then 1 else 0 end order by mealtime)
        - dense_rank() over (partition by person order by mealtime) as chain
    from meals
  )
  group by person, meat, chain
)
order by person, period_start;

PERSON               MEAT PERIOD_START       PERIOD_END       
-------------------- ---- ------------------ ------------------
Jane                 No   04-JAN-15 06:09:09 07-JAN-15 06:01:24 
Jane                 Yes  04-JAN-15 06:09:09 07-JAN-15 06:01:24 
Jane                 No   07-JAN-15 12:03:50                    
John                 No   02-JAN-15 10:03:23 03-JAN-15 10:03:23 
...

That gives you duplicates, effectively, though I've left the 'meat' flag in to make it a bit clearer at this point. Assuming you want to ignore the latest open-ended period you just need to skip those and eliminate the duplicates:

select person, period_start, period_end
from (
  select person, meat,
    case when meat = 'Yes' then lag(first_meal) over (partition by person
        order by first_meal) else first_meal end as period_start,
    case when meat = 'No' then lead(first_meal) over (partition by person
        order by first_meal) else first_meal end as period_end
  from (
    select person, meat, min(mealtime) as first_meal
    from (
      select person, mealtime, foodtype,
        case when foodtype = 'meat' then 'Yes' else 'No' end as meat,
        dense_rank() over (partition by person,
            case when foodtype = 'meat' then 1 else 0 end order by mealtime)
          - dense_rank() over (partition by person order by mealtime) as chain
      from meals
    )
    group by person, meat, chain
  )
)
where meat = 'No'
and period_start is not null
and period_end is not null
order by person, period_start;

PERSON               PERIOD_START       PERIOD_END       
-------------------- ------------------ ------------------
Jane                 04-JAN-15 06:09:09 07-JAN-15 06:01:24 
John                 02-JAN-15 10:03:23 03-JAN-15 10:03:23 
John                 04-JAN-15 10:03:23 06-JAN-15 10:03:23 
Mary                 02-JAN-15 05:01:54 03-JAN-15 06:04:25 
Mary                 05-JAN-15 04:04:25 05-JAN-15 06:04:25 

SQL Fiddle with the intermediate steps in full.

Belatedly realised you only wanted the longest period for each person, which you can get with yet another layer:

select person, period_start, period_end
from (
  select person, period_start, period_end,
    rank() over (partition by person order by period_end - period_start desc) as rnk
  from (
    ...
  )
  where meat = 'No'
  and period_start is not null
  and period_end is not null
)
where rnk = 1
order by person, period_start;

PERSON               PERIOD_START       PERIOD_END       
-------------------- ------------------ ------------------
Jane                 04-JAN-15 06:09:09 07-JAN-15 06:01:24 
John                 04-JAN-15 10:03:23 06-JAN-15 10:03:23 
Mary                 02-JAN-15 05:01:54 03-JAN-15 06:04:25 

Updated SQL Fiddle.

like image 175
Alex Poole Avatar answered Nov 15 '22 08:11

Alex Poole


solution is in SQL SERVER i hope that you can understand easily

with x as (
 select ROW_NUMBER()over( Partition by person order by MealTime) rowId,* from #MEALS
)
,y as (
select ROW_NUMBER() over( Partition by person order by MealTime) rowID, * from 
#MEALS where FOODTYPE='meat')
select x.PERSON,x.MEALTIME startdate,y.MEALTIME endDate,        datediff(second,x.MEALTIME,y.MEALTIME) diff from x 
left join 
y on x.PERSON=y.PERSON where 
x.rowId=1 and y.rowID=1
like image 32
Siva Ganesh Avatar answered Nov 15 '22 10:11

Siva Ganesh