How to do rolling sum over columns in R?

Question

roll_sum and many other methods (e.g. https://vandomed.github.io/moving_averages.html) are for summing over rows only. I have a large matrix that I don't have enough memory to transpose it. Is there a way I can do roll_sum over columns directly?

For example:

library(roll)

A=matrix(rnorm(10000),100)
roll_sum(A,3)

But I want to do this across columns.

To follow up, all of the methods so far are implemented without using multi-core processing. Can anyone offer a solution with this feature?

Cole · Accepted Answer

Here is an rcpp approach.

Rcpp::cppFunction("
NumericMatrix rcpp_column_roll(const NumericMatrix mat, const int n) {

  const int ncol = mat.ncol();
  const int nrow = mat.nrow();
  NumericMatrix out(nrow, ncol);
  std::fill( out.begin(), out.end(), NumericVector::get_na() ) ;

  
  for (int i = 0; i < nrow; i++) {
    NumericVector window(n);
    double roll = 0;
    int oldest_ind = 0;
    
    for (int j = 0; j < n ; j++) {
      double mat_ij = mat(i, j); 
      window(j) = mat_ij;
      roll += mat_ij;
    }
    
    out(i, n - 1) = roll;

    for (int j = n; j < ncol; j ++) {
      double mat_ij = mat(i, j); 
      
      roll += mat_ij;
      roll -= window(oldest_ind);
      
      out(i, j) = roll;
      
      window(oldest_ind) = mat_ij;
      
      if (oldest_ind == n-1) oldest_ind = 0; else oldest_ind++;
    }
  }
  return(out);
}
")

This is about 10x more memory efficient than transposing the result of apply(A, 1L, roll::roll_sum, 3L) and about 50x faster for the sample dataset.

bench::mark(rcpp_column_roll(A, 3),
            t(apply(A, 1, roll::roll_sum, 3)))

## # A tibble: 2 x 13
##   expression                             min   median `itr/sec` mem_alloc
##   <bch:expr>                        <bch:tm> <bch:tm>     <dbl> <bch:byt>
## 1 rcpp_column_roll(A, 3)             134.4us  139.7us     6641.    80.7KB
## 2 t(apply(A, 1, roll::roll_sum, 3))   7.62ms   8.91ms      101.     773KB

## With an 80 MB dataset (`rnorm(1E7)`):

##   expression                          min median `itr/sec` mem_alloc
##   <bch:expr>                        <bch> <bch:>     <dbl> <bch:byt>
## 1 rcpp_column_roll(A, 3)            226ms  229ms      4.17    76.3MB
## 2 t(apply(A, 1, roll::roll_sum, 3)) 740ms  740ms      1.35   498.5MB

## 800 MB dataset (`rnorm(1E8)`):

## # A tibble: 2 x 13
##   expression                          min median `itr/sec` mem_alloc
##   <bch:expr>                        <bch> <bch:>     <dbl> <bch:byt>
## 1 rcpp_column_roll(A, 3)            3.49s  3.49s     0.286  762.94MB
## 2 t(apply(A, 1, roll::roll_sum, 3)) 9.62s  9.62s     0.104    4.84GB

The memory savings seem to be stabilizing at about a 5-fold reduction and is more-or-less the allocation of the result matrix itself.

Alternatively, we can approach it more R-like and use an R loop to make a manual apply that does not need to be transposed.

out = matrix(NA_real_, nrow(A), ncol(A))
for (i in seq_len(nrow(A))) {
  out[i, ] = roll::roll_sum(A[i, ], 3L)
}

Is is moderately better than transposing the regular apply. @Moody_Mudskipper has the fastest approach although rcpp is the most memory efficient.

##rnorm(1e8); ncols = 1000;
# A tibble: 6 x 13
  expression               min median `itr/sec` mem_alloc `gc/sec` n_itr
  <bch:expr>             <bch> <bch:>     <dbl> <bch:byt>    <dbl> <int>
1 rcpp_column_roll(A, 3) 3.32s  3.32s     0.301  762.94MB    0         1
2 for_loop               6.12s  6.12s     0.163    2.98GB    0.327     1
3 dww_sappy                 7s     7s     0.143    4.86GB    0.572     1
4 matStat_Moody          1.81s  1.81s     0.552    2.24GB    0.552     1
5 roll_sum_Ronak         8.34s  8.34s     0.120    4.84GB    0.360     1
6 froll_Oliver           7.75s  7.75s     0.129    4.86GB    0.516     1

Note if you are really short on RAM, you can change the Rcpp function to modify the input directly, meaning you do not have to allocate another matrix. Otherwise you may be better off implementing Moody's clever solution in Rcpp as it would be faster and only need to allocate the out matrix.

Moody_Mudskipper · Answer

Since a rolling sum can be seen as a subtraction of cumsums we can use the package {MatrixStats} which does these cumsums fast.

A <- matrix(1:25,5)
A
#>      [,1] [,2] [,3] [,4] [,5]
#> [1,]    1    6   11   16   21
#> [2,]    2    7   12   17   22
#> [3,]    3    8   13   18   23
#> [4,]    4    9   14   19   24
#> [5,]    5   10   15   20   25

What you can't do because of costly transpose :

library(roll)
t(roll_sum(t(A),3))
#>      [,1] [,2] [,3] [,4] [,5]
#> [1,]   NA   NA   18   33   48
#> [2,]   NA   NA   21   36   51
#> [3,]   NA   NA   24   39   54
#> [4,]   NA   NA   27   42   57
#> [5,]   NA   NA   30   45   60

with {MatrixStats}

library(matrixStats)
#> Warning: le package 'matrixStats' a été compilé avec la version R 4.0.3
row_roll_sum <- function(x, width) {
out <- rowCumsums(x)
out[,seq(width+1,ncol(out))] <- out[,seq(width+1,ncol(out))] -  out[,seq(ncol(out)-width)]
out[,seq(width-1)] <- NA
out
}
row_roll_sum(A, 3)
#>      [,1] [,2] [,3] [,4] [,5]
#> [1,]   NA   NA   18   33   48
#> [2,]   NA   NA   21   36   51
#> [3,]   NA   NA   24   39   54
#> [4,]   NA   NA   27   42   57
#> [5,]   NA   NA   30   45   60

Dewey Brooke · Answer

Rolling Sums by Columns or Rows

`Rcpp` function for roll sums by column or row

Since it’s pretty useful to be able to do this by row or column, I included the margin argument with the same usage seen in base::apply (i.e. 1=rows, 2=columns).

#include <Rcpp.h>
using namespace Rcpp;
using namespace std;

// [[Rcpp::export]]
Rcpp::NumericMatrix matrix_rollsum(SEXP x, int n, int margin) {
  Rcpp::NumericMatrix y(x);
  int NR = y.nrow();
  int NC = y.ncol();
  NumericMatrix result(NR,NC);
  std::fill( result.begin(), result.end(), NumericVector::get_na() ) ;

  if(margin==1){
    for(int i = 0; i < NR; ++i){
      NumericVector tmpvec = y(i,_);
      for(int j = 0; j < NC-n+1;++j){
        double s=0.0;
        for(int q=j; q<j+n;q++){
          s+=tmpvec[q];
        }
        result(i,j+n-1) = s;
        s = 0.0;
      }}}

  if(margin==2){

    for(int i = 0; i < NC; ++i){
      NumericVector tmpvec = y(i,_);
      for(int j = 0; j < NR-n+1;++j){
        double s=0.0;
        for(int q=j; q<j+n;q++){
          s+=tmpvec[q];
        }
        result(j+n-1,i) = s;
        s = 0.0;
      }}}

  return result;
}

Benchmarks

mat_lg <- matrix(runif(1e6,1,1000),1e3,1e3)
res1 <- microbenchmark::microbenchmark(
  matrix_rollsum = matrix_rollsum(mat_lg, 3,1),
  rcpp_colum_roll = rcpp_column_roll(mat_lg,3), 
  apply_rollsum = apply_rollsum(mat_lg,3),
  for_loop = for_loop(mat_lg,3),
  row_roll_sum = row_roll_sum(mat_lg,width = 3),
  times = 1000
)

knitr::kable(summary(res1))

expr	min	lq	mean	median	uq	max	neval	cld
matrix_rollsum	9.128677	10.38814	15.78466	13.43251	17.54006	71.10719	1000	a
rcpp_colum_roll	23.195918	26.54276	33.65227	30.43353	38.11125	113.20687	1000	b
apply_rollsum	58.027111	72.66437	87.12061	80.50741	94.53146	255.69353	1000	c
for_loop	56.408078	71.78122	85.21565	79.10471	89.47916	269.55304	1000	c
row_roll_sum	8.309067	10.40819	15.62686	12.93160	17.21942	81.76514	1000	a

Benchmarks with memory allocation

res2 <- bench::mark(
  matrix_rollsum = matrix_rollsum(mat_lg, 3,1),
  rcpp_colum_roll = rcpp_column_roll(mat_lg,3), 
  apply_rollsum = apply_rollsum(mat_lg,3),
  for_loop = for_loop(mat_lg,3),
  row_roll_sum = row_roll_sum(mat_lg,width = 3),
  iterations = 1000
)

summary(res2)[,1:9]

# A tibble: 5 x 6
  expression           min   median `itr/sec` mem_alloc `gc/sec`
  <bch:expr>      <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
1 matrix_rollsum    9.11ms   11.1ms      79.7   15.31MB    29.0 
2 rcpp_colum_roll   23.2ms   28.6ms      32.2    7.63MB     3.74
3 apply_rollsum    53.94ms   67.1ms      13.7   52.18MB   188.  
4 for_loop         55.18ms     69ms      13.2   33.13MB    17.8 
5 row_roll_sum      8.28ms   10.5ms      78.3   22.87MB    51.5

Benchmark Plots

p1 <- ggplot2::autoplot(res1)
p2 <- ggplot2::autoplot(res2)

library(patchwork)
p1/p2

Edit

Cole brought up a great point. Why copy a large matrix? Wouldn't working on the original object utilize less memory? So I rewrote the Rcpp function to use the original object.

#include <Rcpp.h>
using namespace Rcpp;
using namespace std;

// [[Rcpp::export]]
Rcpp::NumericMatrix test(NumericMatrix x, int n, int margin) {

  Rcpp::NumericMatrix result(x.nrow(),x.ncol());
  std::fill( result.begin(), result.end(), NumericVector::get_na() ) ;
  double s=0.0;

  if(margin==1){
    for(int i = 0; i < x.nrow(); ++i){
      for(int j = 0; j < x.ncol()-n+1;++j){
        for(int q=j; q<j+n;q++){
          s+=x(i,q);
        }
        result(i,j+n-1) = s;
        s = 0.0;
      }}}

  if(margin==2){

    for(int i = 0; i < x.ncol(); ++i){
      for(int j = 0; j < x.nrow()-n+1;++j){
        for(int q=j; q<j+n;q++){
          s+=x(i,q);
        }
        result(j+n-1,i) = s;
        s = 0.0;
      }}}

  return result;
}

Benchmarks

As Cole suspected, the new function allocated half the memory as the original, yet it was suprisingly 3x slower.

expr	min	lq	mean	median	uq	max	neval	cld
matrix_rollsum	9.317332	10.84904	15.47414	13.75330	16.36336	101.6147	1000	a
test	34.498511	40.08057	47.49839	43.26564	48.34093	211.3246	1000	b

# A tibble: 2 x 6
  expression          min   median `itr/sec` mem_alloc `gc/sec`
  <bch:expr>     <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
1 matrix_rollsum   9.15ms   10.1ms      93.7   15.31MB    33.4 
2 test             34.1ms   35.4ms      27.5    7.63MB     3.93

dww · Answer

Using base R matrix indexing we can do

n = 3
sapply(seq_len(NCOL(A)-n+1), function(j) rowSums(A[, j:(j+n-1)]))

No transpose required, and rowSums should be pretty optimized for speed.

Ronak Shah · Answer

Perhaps, you can try using apply on matrix row-wise :

apply(A, 1, zoo::rollsumr, 3, fill = NA)
#Or
#apply(A, 1, roll::roll_sum, 3)

However, note that this will give you output in column-order format. For example,

A <- matrix(1:10, ncol = 5)
apply(A, 1, zoo::rollsumr, 3, fill = NA)

#     [,1] [,2]
#[1,]   NA   NA
#[2,]   NA   NA
#[3,]    9   12
#[4,]   15   18
#[5,]   21   24

Oliver · Answer

Both of the provided answers are equally good here. There seems to be a bit of confusion in the question whether you are looking for a rolling sum over the columns or rows, or whether your output should be transposed by design. If you are looking for the latter I'd suggest looking through Cole's answer and inverting the dimensions and indices of the output matrix.

That say, if what you're looking for is column wise operations and output, you could simply use the froll* functions from the data.table package, which are designed for speed and memory efficiency.

mat <- matrix(rnorm(1e8), ncol = 10))
frollsum = frollsum(mat, 3)

I believe the roll library has somewhat similar performance however.

How to do rolling sum over columns in R?

Tags:

r

matrix

linear-algebra

monotonic

Video Answer

6 Answers

Cole

Moody_Mudskipper

Rolling Sums by Columns or Rows

`Rcpp` function for roll sums by column or row

Benchmarks

Benchmarks with memory allocation

Benchmark Plots

Edit

Benchmarks

Dewey Brooke

dww

Ronak Shah

Oliver

Recent Activity

Donate For Us

How to do rolling sum over columns in R?

Tags:

r

matrix

linear-algebra

monotonic

Video Answer

6 Answers

Cole

Moody_Mudskipper

Rolling Sums by Columns or Rows

Rcpp function for roll sums by column or row

Benchmarks

Benchmarks with memory allocation

Benchmark Plots

Edit

Benchmarks

Dewey Brooke

dww

Ronak Shah

Oliver

Related questions

Recent Activity

Donate For Us

`Rcpp` function for roll sums by column or row