I want to calculate a sum over a sliding window on grouped data.
As I would like to stick to official functions if possible I started with rollapplyr like this:
library(tidyverse)
library(reshape2)
library(zoo)  
data = data.frame(Count=seq(1,10,1),
                  group=c("A","B","A","A","B","B","B","B","A","A"))
window_size <- 3    
data_rolling <- data %>%
  arrange(group) %>%
  group_by(group) %>%
  mutate(Rolling_Count = rollapplyr(Count, width=window_size, FUN=sum, fill = NA)) %>%
  ungroup()
for the first entries which are smaller than width (in this case 3) it gets filled with NA as defined, but I would actually like to have the sum of the possible data there like this:
 Count group Rolling_Count expected_Result
 1     A            NA    1
 3     A            NA    4
 4     A            8     8
 9     A            16    16
10     A            23    23
 2     B            NA    2
 5     B            NA    7
 6     B            13    13
 7     B            18    18
 8     B            21    21
I know that I can replace the width=window_size with something like this:
c(rep(1:window_size,1),rep(window_size:window_size,(n()-window_size)))
to get what I want but this is really slow. In addition this approach would assume that n() is greater than window_size.
So: Is there already an R/zoo function which can handle grouped data like above and in addition data with less than window_size entries and is faster to the above approach?
Thanks for any hints!
A solution based on data.table and RcppRoll that should be much more performant.
It's not as clean as I would like -- there's actually a partial argument in RcppRoll::roll_sum() that hasn't been implemented yet that would theoretically solve this cleanly, but it doesn't seem like that will be worked anytime soon-- see GH Issue #18 . 
At any rate, until someone implements a rolling sum in R that allows what you need here, adding in a cumsum on the first n - 1 rows seems to be a sensible solution.
library(data.table)
library(RcppRoll)
data = data.frame(Count=seq(1,10,1),
                  group=c("A","B","A","A","B","B","B","B","A","A"))
## Convert to a `data.table` by reference
setDT(data)
window_size <- 3    
## Add a counter row so that we can go back and fill in rows
## 1 & 2 of each group
data[,Group_RowNumber := seq_len(.N), keyby = .(group)]
## Do a rolling window -- this won't fill in the first 2 rows
data[,Rolling_Count := RcppRoll::roll_sum(Count,
                                          n = window_size,
                                          align = "right",
                                          fill = NA), keyby = .(group)]
## Go back and fill in the ones we missed
data[Group_RowNumber < window_size, Rolling_Count := cumsum(Count), by = .(group)]
data
#     Count group Group_RowNumber Rolling_Count
#  1:     1     A               1             1
#  2:     3     A               2             4
#  3:     4     A               3             8
#  4:     9     A               4            16
#  5:    10     A               5            23
#  6:     2     B               1             2
#  7:     5     B               2             7
#  8:     6     B               3            13
#  9:     7     B               4            18
# 10:     8     B               5            21
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With