Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Make cumulative sum faster

Tags:

r

rcpp

I'm trying to take cumulative sums for each column of a matrix. Here's my code in R:

testMatrix = matrix(1:65536, ncol=256);
microbenchmark(apply(testMatrix, 2, cumsum), times=100L);

Unit: milliseconds
                         expr      min       lq     mean  median       uq      max neval
 apply(testMatrix, 2, cumsum) 1.599051 1.766112 2.329932 2.15326 2.221538 93.84911 10000

I used Rcpp for comparison:

cppFunction('NumericMatrix apply_cumsum_col(NumericMatrix m) {
    for (int j = 0; j < m.ncol(); ++j) {
        for (int i = 1; i < m.nrow(); ++i) {
            m(i, j) += m(i - 1, j);
        }
    }
    return m;
}');
microbenchmark(apply_cumsum_col(testMatrix), times=10000L);

Unit: microseconds
                         expr     min      lq     mean  median      uq      max neval
 apply_cumsum_col(testMatrix) 205.833 257.719 309.9949 265.986 276.534 96398.93 10000

So the C++ code is 7.5 times as fast. Is it possible to do better than apply(testMatrix, 2, cumsum) in pure R? It feels like I have an order of magnitude overhead for no reason.

like image 247
jcai Avatar asked Jun 12 '15 15:06

jcai


People also ask

What is the difference between cumsum (1) and cumulative (2)?

Consider a two-dimensional input array, A: cumsum (A,1) works on successive elements in the columns of A and returns the cumulative sums of each column. cumsum (A,2) works on successive elements in the rows of A and returns the cumulative sums of each row.

How do you calculate the cumulative sum of two numbers?

For example, to calculate the cumulative sum for numbers in column B beginning in cell B2, enter the following formula in C2 and then copy it down to other cells: =SUM($B$2:B2) In your running total formula, the first reference should always be an absolute reference with the $ sign ($B$2).

What is the difference between cumsum (a) and SUMB (a)?

B = cumsum(A) returns the cumulative sum of A starting at the beginning of the first array dimension in A whose size does not equal 1. If A is a vector, then cumsum(A) returns a vector containing the cumulative sum of the elements of A. If A is a matrix, then cumsum(A) returns a matrix containing the cumulative sums for each column of A.

How do I find the cumulative sum for my bank balance?

In a similar manner, you can use the Excel SUM function to find the cumulative sum for your bank balance. For this, enter deposits as positive numbers, and withdrawals as negative numbers in some column (column C in this example). And then, to show the running total, enter the following formula in column D:


Video Answer


1 Answers

Maybe it is to late but I will write my answer so anyone else can see it.

First of all, in your C++ code you need to clone you matrix otherwise you are write into R's memory and it is forbiden by CRAN. So your code becomes:

rcpp_apply<-cppFunction('NumericMatrix apply_cumsum_col(NumericMatrix m) {
    NumericMatrix g=clone(m);
    for (int j = 0; j < m.ncol(); ++j) {
        for (int i = 1; i < m.nrow(); ++i) {
            g(i, j) += g(i - 1, j);
        }
    }
    return g;
}');

Since your matrix is typeof integer then you can change your C++'s argument to be IntegerMatrix.

rcpp_apply_integer<-cppFunction('IntegerMatrix apply_cumsum_col(IntegerMatrix m) {
    NumericMatrix g=clone(m);
    for (int j = 0; j < m.ncol(); ++j) {
        for (int i = 1; i < m.nrow(); ++i) {
            g(i, j) += g(i - 1, j);
        }
    }
    return g;
}');

This impoved the code about 2 times. Here is a benchmark:

microbenchmark::microbenchmark(R=apply(testMatrix, 2, cumsum),Rcpp=rcpp_apply(testMatrix),Rcpp_integer=rcpp_apply_integer(testMatrix), times=10)

Unit: microseconds
        expr      min       lq      mean    median       uq      max neval
           R 1552.217 1706.165 1770.1264 1740.0345 1897.884 1940.989    10
        Rcpp  502.900  523.838  637.7188  665.0605  699.134  743.471    10
Rcpp_integer  220.455  274.645  274.9327  275.8770  277.930  316.109    10



all.equal(rcpp_apply(testMatrix),rcpp_apply_integer(testMatrix))
[1] TRUE

If your matrix has large values then you have to use NumericMatrix.

like image 56
Manos Papadakis Avatar answered Nov 16 '22 02:11

Manos Papadakis