roll_sum and many other methods (e.g. https://vandomed.github.io/moving_averages.html) are for summing over rows only. I have a large matrix that I don't have enough memory to transpose it. Is there a way I can do roll_sum over columns directly?
For example:
library(roll)
A=matrix(rnorm(10000),100)
roll_sum(A,3)
But I want to do this across columns.
To follow up, all of the methods so far are implemented without using multi-core processing. Can anyone offer a solution with this feature?
Here is an rcpp approach.
Rcpp::cppFunction("
NumericMatrix rcpp_column_roll(const NumericMatrix mat, const int n) {
const int ncol = mat.ncol();
const int nrow = mat.nrow();
NumericMatrix out(nrow, ncol);
std::fill( out.begin(), out.end(), NumericVector::get_na() ) ;
for (int i = 0; i < nrow; i++) {
NumericVector window(n);
double roll = 0;
int oldest_ind = 0;
for (int j = 0; j < n ; j++) {
double mat_ij = mat(i, j);
window(j) = mat_ij;
roll += mat_ij;
}
out(i, n - 1) = roll;
for (int j = n; j < ncol; j ++) {
double mat_ij = mat(i, j);
roll += mat_ij;
roll -= window(oldest_ind);
out(i, j) = roll;
window(oldest_ind) = mat_ij;
if (oldest_ind == n-1) oldest_ind = 0; else oldest_ind++;
}
}
return(out);
}
")
This is about 10x more memory efficient than transposing the result of apply(A, 1L, roll::roll_sum, 3L)
and about 50x faster for the sample dataset.
bench::mark(rcpp_column_roll(A, 3),
t(apply(A, 1, roll::roll_sum, 3)))
## # A tibble: 2 x 13
## expression min median `itr/sec` mem_alloc
## <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt>
## 1 rcpp_column_roll(A, 3) 134.4us 139.7us 6641. 80.7KB
## 2 t(apply(A, 1, roll::roll_sum, 3)) 7.62ms 8.91ms 101. 773KB
## With an 80 MB dataset (`rnorm(1E7)`):
## expression min median `itr/sec` mem_alloc
## <bch:expr> <bch> <bch:> <dbl> <bch:byt>
## 1 rcpp_column_roll(A, 3) 226ms 229ms 4.17 76.3MB
## 2 t(apply(A, 1, roll::roll_sum, 3)) 740ms 740ms 1.35 498.5MB
## 800 MB dataset (`rnorm(1E8)`):
## # A tibble: 2 x 13
## expression min median `itr/sec` mem_alloc
## <bch:expr> <bch> <bch:> <dbl> <bch:byt>
## 1 rcpp_column_roll(A, 3) 3.49s 3.49s 0.286 762.94MB
## 2 t(apply(A, 1, roll::roll_sum, 3)) 9.62s 9.62s 0.104 4.84GB
The memory savings seem to be stabilizing at about a 5-fold reduction and is more-or-less the allocation of the result matrix itself.
Alternatively, we can approach it more R-like and use an R loop to make a manual apply
that does not need to be transposed.
out = matrix(NA_real_, nrow(A), ncol(A))
for (i in seq_len(nrow(A))) {
out[i, ] = roll::roll_sum(A[i, ], 3L)
}
Is is moderately better than transposing the regular apply
. @Moody_Mudskipper has the fastest approach although rcpp is the most memory efficient.
##rnorm(1e8); ncols = 1000;
# A tibble: 6 x 13
expression min median `itr/sec` mem_alloc `gc/sec` n_itr
<bch:expr> <bch> <bch:> <dbl> <bch:byt> <dbl> <int>
1 rcpp_column_roll(A, 3) 3.32s 3.32s 0.301 762.94MB 0 1
2 for_loop 6.12s 6.12s 0.163 2.98GB 0.327 1
3 dww_sappy 7s 7s 0.143 4.86GB 0.572 1
4 matStat_Moody 1.81s 1.81s 0.552 2.24GB 0.552 1
5 roll_sum_Ronak 8.34s 8.34s 0.120 4.84GB 0.360 1
6 froll_Oliver 7.75s 7.75s 0.129 4.86GB 0.516 1
Note if you are really short on RAM, you can change the Rcpp function to modify the input directly, meaning you do not have to allocate another matrix. Otherwise you may be better off implementing Moody's clever solution in Rcpp as it would be faster and only need to allocate the out matrix.
Since a rolling sum can be seen as a subtraction of cumsums we can use the package {MatrixStats} which does these cumsums fast.
A <- matrix(1:25,5)
A
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 1 6 11 16 21
#> [2,] 2 7 12 17 22
#> [3,] 3 8 13 18 23
#> [4,] 4 9 14 19 24
#> [5,] 5 10 15 20 25
What you can't do because of costly transpose :
library(roll)
t(roll_sum(t(A),3))
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] NA NA 18 33 48
#> [2,] NA NA 21 36 51
#> [3,] NA NA 24 39 54
#> [4,] NA NA 27 42 57
#> [5,] NA NA 30 45 60
with {MatrixStats}
library(matrixStats)
#> Warning: le package 'matrixStats' a été compilé avec la version R 4.0.3
row_roll_sum <- function(x, width) {
out <- rowCumsums(x)
out[,seq(width+1,ncol(out))] <- out[,seq(width+1,ncol(out))] - out[,seq(ncol(out)-width)]
out[,seq(width-1)] <- NA
out
}
row_roll_sum(A, 3)
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] NA NA 18 33 48
#> [2,] NA NA 21 36 51
#> [3,] NA NA 24 39 54
#> [4,] NA NA 27 42 57
#> [5,] NA NA 30 45 60
Rcpp
function for roll sums by column or rowSince it’s pretty useful to be able to do this by row or column, I
included the margin
argument with the same usage seen in base::apply
(i.e. 1=rows, 2=columns).
#include <Rcpp.h>
using namespace Rcpp;
using namespace std;
// [[Rcpp::export]]
Rcpp::NumericMatrix matrix_rollsum(SEXP x, int n, int margin) {
Rcpp::NumericMatrix y(x);
int NR = y.nrow();
int NC = y.ncol();
NumericMatrix result(NR,NC);
std::fill( result.begin(), result.end(), NumericVector::get_na() ) ;
if(margin==1){
for(int i = 0; i < NR; ++i){
NumericVector tmpvec = y(i,_);
for(int j = 0; j < NC-n+1;++j){
double s=0.0;
for(int q=j; q<j+n;q++){
s+=tmpvec[q];
}
result(i,j+n-1) = s;
s = 0.0;
}}}
if(margin==2){
for(int i = 0; i < NC; ++i){
NumericVector tmpvec = y(i,_);
for(int j = 0; j < NR-n+1;++j){
double s=0.0;
for(int q=j; q<j+n;q++){
s+=tmpvec[q];
}
result(j+n-1,i) = s;
s = 0.0;
}}}
return result;
}
mat_lg <- matrix(runif(1e6,1,1000),1e3,1e3)
res1 <- microbenchmark::microbenchmark(
matrix_rollsum = matrix_rollsum(mat_lg, 3,1),
rcpp_colum_roll = rcpp_column_roll(mat_lg,3),
apply_rollsum = apply_rollsum(mat_lg,3),
for_loop = for_loop(mat_lg,3),
row_roll_sum = row_roll_sum(mat_lg,width = 3),
times = 1000
)
knitr::kable(summary(res1))
expr | min | lq | mean | median | uq | max | neval | cld |
---|---|---|---|---|---|---|---|---|
matrix_rollsum | 9.128677 | 10.38814 | 15.78466 | 13.43251 | 17.54006 | 71.10719 | 1000 | a |
rcpp_colum_roll | 23.195918 | 26.54276 | 33.65227 | 30.43353 | 38.11125 | 113.20687 | 1000 | b |
apply_rollsum | 58.027111 | 72.66437 | 87.12061 | 80.50741 | 94.53146 | 255.69353 | 1000 | c |
for_loop | 56.408078 | 71.78122 | 85.21565 | 79.10471 | 89.47916 | 269.55304 | 1000 | c |
row_roll_sum | 8.309067 | 10.40819 | 15.62686 | 12.93160 | 17.21942 | 81.76514 | 1000 | a |
res2 <- bench::mark(
matrix_rollsum = matrix_rollsum(mat_lg, 3,1),
rcpp_colum_roll = rcpp_column_roll(mat_lg,3),
apply_rollsum = apply_rollsum(mat_lg,3),
for_loop = for_loop(mat_lg,3),
row_roll_sum = row_roll_sum(mat_lg,width = 3),
iterations = 1000
)
summary(res2)[,1:9]
# A tibble: 5 x 6
expression min median `itr/sec` mem_alloc `gc/sec`
<bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
1 matrix_rollsum 9.11ms 11.1ms 79.7 15.31MB 29.0
2 rcpp_colum_roll 23.2ms 28.6ms 32.2 7.63MB 3.74
3 apply_rollsum 53.94ms 67.1ms 13.7 52.18MB 188.
4 for_loop 55.18ms 69ms 13.2 33.13MB 17.8
5 row_roll_sum 8.28ms 10.5ms 78.3 22.87MB 51.5
p1 <- ggplot2::autoplot(res1)
p2 <- ggplot2::autoplot(res2)
library(patchwork)
p1/p2
Cole brought up a great point. Why copy a large matrix? Wouldn't working on the original object utilize less memory? So I rewrote the Rcpp
function to use the original object.
#include <Rcpp.h>
using namespace Rcpp;
using namespace std;
// [[Rcpp::export]]
Rcpp::NumericMatrix test(NumericMatrix x, int n, int margin) {
Rcpp::NumericMatrix result(x.nrow(),x.ncol());
std::fill( result.begin(), result.end(), NumericVector::get_na() ) ;
double s=0.0;
if(margin==1){
for(int i = 0; i < x.nrow(); ++i){
for(int j = 0; j < x.ncol()-n+1;++j){
for(int q=j; q<j+n;q++){
s+=x(i,q);
}
result(i,j+n-1) = s;
s = 0.0;
}}}
if(margin==2){
for(int i = 0; i < x.ncol(); ++i){
for(int j = 0; j < x.nrow()-n+1;++j){
for(int q=j; q<j+n;q++){
s+=x(i,q);
}
result(j+n-1,i) = s;
s = 0.0;
}}}
return result;
}
As Cole suspected, the new function allocated half the memory as the original, yet it was suprisingly 3x slower.
expr | min | lq | mean | median | uq | max | neval | cld |
---|---|---|---|---|---|---|---|---|
matrix_rollsum | 9.317332 | 10.84904 | 15.47414 | 13.75330 | 16.36336 | 101.6147 | 1000 | a |
test | 34.498511 | 40.08057 | 47.49839 | 43.26564 | 48.34093 | 211.3246 | 1000 | b |
# A tibble: 2 x 6
expression min median `itr/sec` mem_alloc `gc/sec`
<bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
1 matrix_rollsum 9.15ms 10.1ms 93.7 15.31MB 33.4
2 test 34.1ms 35.4ms 27.5 7.63MB 3.93
Using base R matrix indexing we can do
n = 3
sapply(seq_len(NCOL(A)-n+1), function(j) rowSums(A[, j:(j+n-1)]))
No transpose required, and rowSums
should be pretty optimized for speed.
Perhaps, you can try using apply
on matrix row-wise :
apply(A, 1, zoo::rollsumr, 3, fill = NA)
#Or
#apply(A, 1, roll::roll_sum, 3)
However, note that this will give you output in column-order format. For example,
A <- matrix(1:10, ncol = 5)
apply(A, 1, zoo::rollsumr, 3, fill = NA)
# [,1] [,2]
#[1,] NA NA
#[2,] NA NA
#[3,] 9 12
#[4,] 15 18
#[5,] 21 24
Both of the provided answers are equally good here. There seems to be a bit of confusion in the question whether you are looking for a rolling sum over the columns or rows, or whether your output should be transposed by design. If you are looking for the latter I'd suggest looking through Cole's answer and inverting the dimensions and indices of the output matrix.
That say, if what you're looking for is column wise operations and output, you could simply use the froll*
functions from the data.table
package, which are designed for speed and memory efficiency.
mat <- matrix(rnorm(1e8), ncol = 10))
frollsum = frollsum(mat, 3)
I believe the roll
library has somewhat similar performance however.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With