I'm trying to group a data frame by 3 date ranges based on "2016-04-10" and "2016-04-24".
df <- structure(list(date = structure(c(16803, 16810, 16817, 16824,
16831, 16838, 16845, 16852, 16859, 16866, 16873, 16880, 16887,
16894, 16901, 16908, 16915, 16922, 16929, 16936, 16943), class = "Date"),
new = c(1507L, 2851L, 3550L, 5329L, 7557L, 5546L, 6264L,
7160L, 9468L, 5789L, 5928L, 4642L, 8145L, 4867L, 4846L, 5231L,
7137L, 3938L, 3741L, 2937L, 194L), resolved = c(21, 27, 15,
16, 56, 2773, 8490, 8748, 9325, 7734, 10264, 6739, 6110,
9613, 10314, 10349, 7200, 9637, 10831, 11170, 5666), ost = c(1486,
2824, 3535, 5313, 7501, 2773, -2226, -1588, 143, -1945, -4336,
-2097, 2035, -4746, -5468, -5118, -63, -5699, -7090, -8233,
-5472)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-21L), .Names = c("date", "new", "resolved", "ost"))
Tried the following:
df1 <- df %>% group_by(dr=cut(date,breaks=as.Date(c("2016-04-10","2016-04-24")))) %>%
summarise(ost = sum(ost))
Which gives the wrong result as below.
dr ost
2016-04-10 -10586
NA -17885
Help appreciated!
We create a grouping variable 'dr' with cut
. The breaks
mentioned are the range
of the 'date' i.e. the min
and max
values of the 'date' along with the dates specified by the OP, concatenate it (c
), use the option include.lowest
and get the sum
of 'ost' based on this grouping variable.
df %>%
group_by(dr = cut(date, breaks = c(range(date),
as.Date(c("2016-04-10", "2016-04-24"))), include.lowest=TRUE) ) %>%
summarise(ost =sum(ost))
# dr ost
# <fctr> <dbl>
#1 2016-01-03 8672
#2 2016-04-10 -10586
#3 2016-04-24 -26557
Or another option is findInterval
which could be faster compared to cut
.
df %>%
group_by(dr = findInterval(date, as.Date(c("2016-04-10", "2016-04-24")))) %>%
summarise(ost = sum(ost))
# dr ost
# <int> <dbl>
#1 0 8672
#2 1 -10586
#3 2 -26557
NOTE: The OP asked the question about cut
and this solution gives that.
You can create a grouping variable first,
df %>%
mutate(group = cumsum(grepl('2016-04-10|2016-04-24', date))) %>%
group_by(group) %>%
summarise(ost = sum(ost))
#Source: local data frame [3 x 2]
# group ost
# (int) (dbl)
#1 0 8672
#2 1 -10586
#3 2 -26557
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With