Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Counting unique days with overlap and gaps in date ranges

Tags:

date

r

dplyr

Group       Start            End             Days
A           5/12/2015        5/14/2015       3
A           5/12/2015        5/14/2015       3
B           1/1/2015         1/3/2015        3
B           1/1/2015         1/3/2015        3
H           1/8/2015         1/9/2015        2
H           1/8/2015         1/9/2015        2
H           1/13/2015        1/15/2015       3
H           1/7/2015         1/17/2015       3
H           1/12/2015        1/22/2015       7

I've attached a sample of my dataset above. I'm trying to count the number of unique days for each group in R. For some observations its quite simple i.e A and B. However there are some groups with varying overlap of days as well as gaps in the date ranges i.e H.

Is there anyway I can summarize the number of unique days (no overlap and accounting for the gaps) for each group in R? i.e A and B would return 3 days respectively, and H would return 11 days.

Group   Count
A       3
B       3
H       16

My best guess would be using the dplyr and summarize function, however I haven't been able to wrap my head around any solution. Any help is appreciated! Thank you

like image 619
Michael Luu Avatar asked May 12 '16 19:05

Michael Luu


1 Answers

Here's a dplyr solution:

library(dplyr)

df %>%
    group_by(Group,rn = row_number()) %>%
    do(data.frame(.,Date = seq(as.Date(.$Start,format = '%m/%d/%Y'),
                               as.Date(.$End,format = '%m/%d/%Y'),
                               '1 day'))) %>%
    group_by(Group) %>%
    summarise(numDays = n_distinct(Date))

The idea is to create a new column which contains a sequence of dates from Start to End, and then count the length of unique observations in each group.

This gives:

   Group numDays
  (fctr)   (int)
1      A       3
2      B       3
3      H      16
like image 117
shreyasgm Avatar answered Nov 10 '22 00:11

shreyasgm