I'm trying to take cumulative sums for each column of a matrix. Here's my code in R:
testMatrix = matrix(1:65536, ncol=256);
microbenchmark(apply(testMatrix, 2, cumsum), times=100L);
Unit: milliseconds
expr min lq mean median uq max neval
apply(testMatrix, 2, cumsum) 1.599051 1.766112 2.329932 2.15326 2.221538 93.84911 10000
I used Rcpp for comparison:
cppFunction('NumericMatrix apply_cumsum_col(NumericMatrix m) {
for (int j = 0; j < m.ncol(); ++j) {
for (int i = 1; i < m.nrow(); ++i) {
m(i, j) += m(i - 1, j);
}
}
return m;
}');
microbenchmark(apply_cumsum_col(testMatrix), times=10000L);
Unit: microseconds
expr min lq mean median uq max neval
apply_cumsum_col(testMatrix) 205.833 257.719 309.9949 265.986 276.534 96398.93 10000
So the C++ code is 7.5 times as fast. Is it possible to do better than apply(testMatrix, 2, cumsum)
in pure R? It feels like I have an order of magnitude overhead for no reason.
Consider a two-dimensional input array, A: cumsum (A,1) works on successive elements in the columns of A and returns the cumulative sums of each column. cumsum (A,2) works on successive elements in the rows of A and returns the cumulative sums of each row.
For example, to calculate the cumulative sum for numbers in column B beginning in cell B2, enter the following formula in C2 and then copy it down to other cells: =SUM($B$2:B2) In your running total formula, the first reference should always be an absolute reference with the $ sign ($B$2).
B = cumsum(A) returns the cumulative sum of A starting at the beginning of the first array dimension in A whose size does not equal 1. If A is a vector, then cumsum(A) returns a vector containing the cumulative sum of the elements of A. If A is a matrix, then cumsum(A) returns a matrix containing the cumulative sums for each column of A.
In a similar manner, you can use the Excel SUM function to find the cumulative sum for your bank balance. For this, enter deposits as positive numbers, and withdrawals as negative numbers in some column (column C in this example). And then, to show the running total, enter the following formula in column D:
Maybe it is to late but I will write my answer so anyone else can see it.
First of all, in your C++ code you need to clone you matrix otherwise you are write into R's memory and it is forbiden by CRAN. So your code becomes:
rcpp_apply<-cppFunction('NumericMatrix apply_cumsum_col(NumericMatrix m) {
NumericMatrix g=clone(m);
for (int j = 0; j < m.ncol(); ++j) {
for (int i = 1; i < m.nrow(); ++i) {
g(i, j) += g(i - 1, j);
}
}
return g;
}');
Since your matrix is typeof integer
then you can change your C++'s argument to be IntegerMatrix
.
rcpp_apply_integer<-cppFunction('IntegerMatrix apply_cumsum_col(IntegerMatrix m) {
NumericMatrix g=clone(m);
for (int j = 0; j < m.ncol(); ++j) {
for (int i = 1; i < m.nrow(); ++i) {
g(i, j) += g(i - 1, j);
}
}
return g;
}');
This impoved the code about 2 times. Here is a benchmark:
microbenchmark::microbenchmark(R=apply(testMatrix, 2, cumsum),Rcpp=rcpp_apply(testMatrix),Rcpp_integer=rcpp_apply_integer(testMatrix), times=10)
Unit: microseconds
expr min lq mean median uq max neval
R 1552.217 1706.165 1770.1264 1740.0345 1897.884 1940.989 10
Rcpp 502.900 523.838 637.7188 665.0605 699.134 743.471 10
Rcpp_integer 220.455 274.645 274.9327 275.8770 277.930 316.109 10
all.equal(rcpp_apply(testMatrix),rcpp_apply_integer(testMatrix))
[1] TRUE
If your matrix has large values then you have to use NumericMatrix
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With