Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Rolling sum in dplyr

set.seed(123)

df <- data.frame(x = sample(1:10, 20, replace = T), id = rep(1:2, each = 10))

For each id, I want to create a column which has the sum of previous 5 x values.

df %>% group_by(id) %>% mutate(roll.sum = c(x[1:4], zoo::rollapply(x, 5, sum)))
# Groups:   id [2]
  x    id roll.sum
<int> <int>    <int>
 3     1        3
 8     1        8
 5     1        5
 9     1        9
10     1       10
 1     1       36
 6     1       39
 9     1       40
 6     1       41
 5     1       37
10     2       10
 5     2        5
 7     2        7
 6     2        6
 2     2        2
 9     2       39
 3     2       32
 1     2       28
 4     2       25
10     2       29

The 6th row should be 35 (3 + 8 + 5 + 9 + 10), the 7th row should be 33 (8 + 5 + 9 + 10 + 1) and so on.

However, the above function is also including the row itself for calculation. How can I fix it?

like image 541
89_Simple Avatar asked Dec 24 '22 05:12

89_Simple


2 Answers

 library(zoo)
 df %>%  group_by(id) %>%
       mutate(Sum_prev = rollapply(x, list(-(1:5)), sum, fill=NA, align = "right", partial=F))  

#you can use rollapply(x, list((1:5)), sum, fill=NA, align = "left", partial=F) 
#to sum the next 5 elements scaping the current one 


     x id Sum_prev
 1   3  1         NA
 2   8  1         NA
 3   5  1         NA
 4   9  1         NA
 5  10  1         NA
 6   1  1         35
 7   6  1         33
 8   9  1         31
 9   6  1         35
 10  5  1         32
 11 10  2         NA
 12  5  2         NA
 13  7  2         NA
 14  6  2         NA
 15  2  2         NA
 16  9  2         30
 17  3  2         29
 18  1  2         27
 19  4  2         21
 20 10  2         19
like image 89
A. Suliman Avatar answered Jan 02 '23 09:01

A. Suliman


There is the rollify function in the tibbletime package that you could use. You can read about it in this vignette: Rolling calculations in tibbletime.

library(tibbletime)
library(dplyr)
rollig_sum <- rollify(.f = sum, window = 5)

df %>% 
  group_by(id) %>% 
  mutate(roll.sum = lag(rollig_sum(x))) #added lag() here
# A tibble: 20 x 3
# Groups:   id [2]
#       x    id roll.sum
#   <int> <int>    <int>
# 1     3     1       NA
# 2     8     1       NA
# 3     5     1       NA
# 4     9     1       NA
# 5    10     1       NA
# 6     1     1       35
# 7     6     1       33
# 8     9     1       31
# 9     6     1       35
#10     5     1       32
#11    10     2       NA
#12     5     2       NA
#13     7     2       NA
#14     6     2       NA
#15     2     2       NA
#16     9     2       30
#17     3     2       29
#18     1     2       27
#19     4     2       21
#20    10     2       19

If you want the NAs to be some other value, you can use, for example, if_else

df %>% 
  group_by(id) %>% 
  mutate(roll.sum = lag(rollig_sum(x))) %>%
  mutate(roll.sum = if_else(is.na(roll.sum), x, roll.sum))
like image 39
markus Avatar answered Jan 02 '23 11:01

markus