I am looking to write an SQL query that identifies the longest period that an individual has gone without having a meat meal. Ideally the output would look like
person periodstart periodend
where for each person the longest period without meat would be identified and
periodstart would be the time of the first non-meat meal
periodend would be the time of the first meat meal following.
SQL below creates table and data .
CREATE TABLE MEALS
(
PERSON VARCHAR2(20 BYTE)
, MEALTIME DATE
, FOODTYPE VARCHAR2(20)
);
Insert into MEALS (PERSON,MEALTIME,FOODTYPE)
values ('Jane',to_date('04-JAN-15 06:09:09','DD-MON-RR HH24:MI:SS'),'fruit');
Insert into MEALS (PERSON,MEALTIME,FOODTYPE)
values ('Jane',to_date('05-JAN-15 06:09:09','DD-MON-RR HH24:MI:SS'),'veg');
Insert into MEALS (PERSON,MEALTIME,FOODTYPE)
values ('Jane',to_date('07-JAN-15 06:01:24','DD-MON-RR HH24:MI:SS'),'meat');
Insert into MEALS (PERSON,MEALTIME,FOODTYPE)
values ('Jane',to_date('07-JAN-15 12:03:50','DD-MON-RR HH24:MI:SS'),'veg');
Insert into MEALS (PERSON,MEALTIME,FOODTYPE)
values ('John',to_date('02-JAN-15 10:03:23','DD-MON-RR HH24:MI:SS'),'veg');
Insert into MEALS (PERSON,MEALTIME,FOODTYPE)
values ('John',to_date('03-JAN-15 10:03:23','DD-MON-RR HH24:MI:SS'),'meat');
Insert into MEALS (PERSON,MEALTIME,FOODTYPE)
values ('John',to_date('04-JAN-15 10:03:23','DD-MON-RR HH24:MI:SS'),'veg');
Insert into MEALS (PERSON,MEALTIME,FOODTYPE)
values ('John',to_date('05-JAN-15 07:03:23','DD-MON-RR HH24:MI:SS'),'veg');
Insert into MEALS (PERSON,MEALTIME,FOODTYPE)
values ('John',to_date('05-JAN-15 10:03:23','DD-MON-RR HH24:MI:SS'),'veg');
Insert into MEALS (PERSON,MEALTIME,FOODTYPE)
values ('John',to_date('06-JAN-15 05:01:54','DD-MON-RR HH24:MI:SS'),'veg');
Insert into MEALS (PERSON,MEALTIME,FOODTYPE)
values ('John',to_date('06-JAN-15 05:01:54','DD-MON-RR HH24:MI:SS'),'fruit');
Insert into MEALS (PERSON,MEALTIME,FOODTYPE)
values ('John',to_date('06-JAN-15 10:03:23','DD-MON-RR HH24:MI:SS'),'meat');
Insert into MEALS (PERSON,MEALTIME,FOODTYPE)
values ('Mary',to_date('02-JAN-15 05:01:54','DD-MON-RR HH24:MI:SS'),'veg');
Insert into MEALS (PERSON,MEALTIME,FOODTYPE)
values ('Mary',to_date('03-JAN-15 06:04:25','DD-MON-RR HH24:MI:SS'),'meat');
Insert into MEALS (PERSON,MEALTIME,FOODTYPE)
values ('Mary',to_date('05-JAN-15 04:04:25','DD-MON-RR HH24:MI:SS'),'veg');
Insert into MEALS (PERSON,MEALTIME,FOODTYPE)
values ('Mary',to_date('05-JAN-15 06:04:25','DD-MON-RR HH24:MI:SS'),'meat');
Insert into MEALS (PERSON,MEALTIME,FOODTYPE)
values ('Mary',to_date('05-JAN-15 06:04:25','DD-MON-RR HH24:MI:SS'),'meat');
Insert into MEALS (PERSON,MEALTIME,FOODTYPE)
values ('Mary',to_date('06-JAN-15 05:01:54','DD-MON-RR HH24:MI:SS'),'veg');
Insert into MEALS (PERSON,MEALTIME,FOODTYPE)
values ('Mary',to_date('07-JAN-15 06:04:25','DD-MON-RR HH24:MI:SS'),'veg');
commit;
This is a gaps-and-islands problem, and there are various ways to approach it. One is to use an analytic function effect/trick to finds the chains of contiguous periods for each type:
select person, mealtime, foodtype,
case when foodtype = 'meat' then 'Yes' else 'No' end as meat,
dense_rank() over (partition by person,
case when foodtype = 'meat' then 1 else 0 end order by mealtime)
- dense_rank() over (partition by person order by mealtime) as chain
from meals
order by person, mealtime;
The 'chain' pseudocolumn is based on a case
here, as you want fruit and veg - or anything non-meat - treated the same.
You can then use that as an inner query to find the start of each meat and non-meat period, from the first meal in each chain:
select person, meat, min(mealtime) as first_meal
from (
select person, mealtime, foodtype,
case when foodtype = 'meat' then 'Yes' else 'No' end as meat,
dense_rank() over (partition by person,
case when foodtype = 'meat' then 1 else 0 end order by mealtime)
- dense_rank() over (partition by person order by mealtime) as chain
from meals
)
group by person, meat, chain
order by person, min(mealtime);
PERSON MEAT FIRST_MEAL
-------------------- ---- ------------------
Jane No 04-JAN-15 06:09:09
Jane Yes 07-JAN-15 06:01:24
Jane No 07-JAN-15 12:03:50
John No 02-JAN-15 10:03:23
...
You want the period to cover the first non-meat meal to the next meat-meal, so you can use that as an inner query with lead and lag to peek at the rows either side: for a veg period you peek ahead to see the start of the next meat period; for a meat period you peek back to see he start of the previous veg period:
select person, meat,
case when meat = 'Yes' then lag(first_meal) over (partition by person
order by first_meal) else first_meal end as period_start,
case when meat = 'No' then lead(first_meal) over (partition by person
order by first_meal) else first_meal end as period_end
from (
select person, meat, min(mealtime) as first_meal
from (
select person, mealtime, foodtype,
case when foodtype = 'meat' then 'Yes' else 'No' end as meat,
dense_rank() over (partition by person,
case when foodtype = 'meat' then 1 else 0 end order by mealtime)
- dense_rank() over (partition by person order by mealtime) as chain
from meals
)
group by person, meat, chain
)
order by person, period_start;
PERSON MEAT PERIOD_START PERIOD_END
-------------------- ---- ------------------ ------------------
Jane No 04-JAN-15 06:09:09 07-JAN-15 06:01:24
Jane Yes 04-JAN-15 06:09:09 07-JAN-15 06:01:24
Jane No 07-JAN-15 12:03:50
John No 02-JAN-15 10:03:23 03-JAN-15 10:03:23
...
That gives you duplicates, effectively, though I've left the 'meat' flag in to make it a bit clearer at this point. Assuming you want to ignore the latest open-ended period you just need to skip those and eliminate the duplicates:
select person, period_start, period_end
from (
select person, meat,
case when meat = 'Yes' then lag(first_meal) over (partition by person
order by first_meal) else first_meal end as period_start,
case when meat = 'No' then lead(first_meal) over (partition by person
order by first_meal) else first_meal end as period_end
from (
select person, meat, min(mealtime) as first_meal
from (
select person, mealtime, foodtype,
case when foodtype = 'meat' then 'Yes' else 'No' end as meat,
dense_rank() over (partition by person,
case when foodtype = 'meat' then 1 else 0 end order by mealtime)
- dense_rank() over (partition by person order by mealtime) as chain
from meals
)
group by person, meat, chain
)
)
where meat = 'No'
and period_start is not null
and period_end is not null
order by person, period_start;
PERSON PERIOD_START PERIOD_END
-------------------- ------------------ ------------------
Jane 04-JAN-15 06:09:09 07-JAN-15 06:01:24
John 02-JAN-15 10:03:23 03-JAN-15 10:03:23
John 04-JAN-15 10:03:23 06-JAN-15 10:03:23
Mary 02-JAN-15 05:01:54 03-JAN-15 06:04:25
Mary 05-JAN-15 04:04:25 05-JAN-15 06:04:25
SQL Fiddle with the intermediate steps in full.
Belatedly realised you only wanted the longest period for each person, which you can get with yet another layer:
select person, period_start, period_end
from (
select person, period_start, period_end,
rank() over (partition by person order by period_end - period_start desc) as rnk
from (
...
)
where meat = 'No'
and period_start is not null
and period_end is not null
)
where rnk = 1
order by person, period_start;
PERSON PERIOD_START PERIOD_END
-------------------- ------------------ ------------------
Jane 04-JAN-15 06:09:09 07-JAN-15 06:01:24
John 04-JAN-15 10:03:23 06-JAN-15 10:03:23
Mary 02-JAN-15 05:01:54 03-JAN-15 06:04:25
Updated SQL Fiddle.
solution is in SQL SERVER i hope that you can understand easily
with x as (
select ROW_NUMBER()over( Partition by person order by MealTime) rowId,* from #MEALS
)
,y as (
select ROW_NUMBER() over( Partition by person order by MealTime) rowID, * from
#MEALS where FOODTYPE='meat')
select x.PERSON,x.MEALTIME startdate,y.MEALTIME endDate, datediff(second,x.MEALTIME,y.MEALTIME) diff from x
left join
y on x.PERSON=y.PERSON where
x.rowId=1 and y.rowID=1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With