Group Start End Days
A 5/12/2015 5/14/2015 3
A 5/12/2015 5/14/2015 3
B 1/1/2015 1/3/2015 3
B 1/1/2015 1/3/2015 3
H 1/8/2015 1/9/2015 2
H 1/8/2015 1/9/2015 2
H 1/13/2015 1/15/2015 3
H 1/7/2015 1/17/2015 3
H 1/12/2015 1/22/2015 7
I've attached a sample of my dataset above. I'm trying to count the number of unique days for each group in R. For some observations its quite simple i.e A and B. However there are some groups with varying overlap of days as well as gaps in the date ranges i.e H.
Is there anyway I can summarize the number of unique days (no overlap and accounting for the gaps) for each group in R? i.e A and B would return 3 days respectively, and H would return 11 days.
Group Count
A 3
B 3
H 16
My best guess would be using the dplyr and summarize function, however I haven't been able to wrap my head around any solution. Any help is appreciated! Thank you
Here's a dplyr
solution:
library(dplyr)
df %>%
group_by(Group,rn = row_number()) %>%
do(data.frame(.,Date = seq(as.Date(.$Start,format = '%m/%d/%Y'),
as.Date(.$End,format = '%m/%d/%Y'),
'1 day'))) %>%
group_by(Group) %>%
summarise(numDays = n_distinct(Date))
The idea is to create a new column which contains a sequence of dates from Start to End, and then count the length of unique observations in each group.
This gives:
Group numDays
(fctr) (int)
1 A 3
2 B 3
3 H 16
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With