i have a matrix data frame 6940 rows and 100 columns. I need to find 5 days cumulative at a time on the data set. Right now I was able to build a for loop code for this as follows :
cum<- matrix(data=q1,nrow=6940,ncol=100)
for (j in 1:100){
for (i in 1:6940){
cum[i,j]<-sum(q1[i,j],q1[i+1,j],q1[i+2,j],q1[i+3,j],q1[i+4,j],na.rm=T)
}
}
I wanted to know whether there is any function in apply family to do the same, as this code is very time consuming.
for example if i generate a data frame using the command
ens <- matrix(rnorm(200),20)
I want cumulative sum of 5 rows a time. i.e sum of row1:row5, row2:row6, row3:row7 and so on in a form of data frame.
i tried using apply function in this form :
apply(apply(apply(apply( apply(m, 2, cumsum),2, cumsum), 2, cumsum),2,cumsum),2,cumsum)
but the problem is I don't get the cumulative in blocks of 5, only an overall cumulative.
Here is one approach using the stats::filter function to calculate the rolling sums and apply to loop over the columns:
m <- matrix(1:48, ncol = 4)
# [,1] [,2] [,3] [,4]
# [1,] 1 13 25 37
# [2,] 2 14 26 38
# [3,] 3 15 27 39
# [4,] 4 16 28 40
# [5,] 5 17 29 41
# [6,] 6 18 30 42
# [7,] 7 19 31 43
# [8,] 8 20 32 44
# [9,] 9 21 33 45
#[10,] 10 22 34 46
#[11,] 11 23 35 47
#[12,] 12 24 36 48
apply(m, 2, filter, filter = rep(1, 5), sides = 1)
# [,1] [,2] [,3] [,4]
# [1,] NA NA NA NA
# [2,] NA NA NA NA
# [3,] NA NA NA NA
# [4,] NA NA NA NA
# [5,] 15 75 135 195
# [6,] 20 80 140 200
# [7,] 25 85 145 205
# [8,] 30 90 150 210
# [9,] 35 95 155 215
#[10,] 40 100 160 220
#[11,] 45 105 165 225
#[12,] 50 110 170 230
This might have to be adjusted depending on how you want to handle windows with less than 5 values (e.g., here in the beginning).
Another option is roll_sum (Data from @Roland's post)
library(RcppRoll)
apply(m, 2, roll_sumr, 5)
# [,1] [,2] [,3] [,4]
# [1,] NA NA NA NA
# [2,] NA NA NA NA
# [3,] NA NA NA NA
# [4,] NA NA NA NA
# [5,] 15 75 135 195
# [6,] 20 80 140 200
# [7,] 25 85 145 205
# [8,] 30 90 150 210
# [9,] 35 95 155 215
#[10,] 40 100 160 220
#[11,] 45 105 165 225
#[12,] 50 110 170 230
As @alexis_laz mentioned in the comments, roll_sumr can take matrix as well. It is more efficient.
roll_sumr(m, 5, by = 1)
set.seed(24)
m1 <- matrix(sample(1:50, 5000*5000, replace=TRUE), ncol=5000)
system.time(apply(m1, 2, roll_sumr, 5))
# user system elapsed
# 1.84 0.16 1.99
system.time(roll_sumr(m1, 5, by = 1))
# user system elapsed
# 0.59 0.15 0.74
system.time(apply(m1, 2, stats::filter, filter = rep(1, 5), sides = 1))
# user system elapsed
# 4.46 0.20 4.68
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With