Make cumulative sum faster

Tags:

r

rcpp

I'm trying to take cumulative sums for each column of a matrix. Here's my code in R:

testMatrix = matrix(1:65536, ncol=256);
microbenchmark(apply(testMatrix, 2, cumsum), times=100L);

Unit: milliseconds
                         expr      min       lq     mean  median       uq      max neval
 apply(testMatrix, 2, cumsum) 1.599051 1.766112 2.329932 2.15326 2.221538 93.84911 10000

I used Rcpp for comparison:

cppFunction('NumericMatrix apply_cumsum_col(NumericMatrix m) {
    for (int j = 0; j < m.ncol(); ++j) {
        for (int i = 1; i < m.nrow(); ++i) {
            m(i, j) += m(i - 1, j);
        }
    }
    return m;
}');
microbenchmark(apply_cumsum_col(testMatrix), times=10000L);

Unit: microseconds
                         expr     min      lq     mean  median      uq      max neval
 apply_cumsum_col(testMatrix) 205.833 257.719 309.9949 265.986 276.534 96398.93 10000

So the C++ code is 7.5 times as fast. Is it possible to do better than apply(testMatrix, 2, cumsum) in pure R? It feels like I have an order of magnitude overhead for no reason.

247

asked Jun 12 '15 15:06

jcai

Video Answer

1 Answers

Maybe it is to late but I will write my answer so anyone else can see it.

First of all, in your C++ code you need to clone you matrix otherwise you are write into R's memory and it is forbiden by CRAN. So your code becomes:

rcpp_apply<-cppFunction('NumericMatrix apply_cumsum_col(NumericMatrix m) {
    NumericMatrix g=clone(m);
    for (int j = 0; j < m.ncol(); ++j) {
        for (int i = 1; i < m.nrow(); ++i) {
            g(i, j) += g(i - 1, j);
        }
    }
    return g;
}');

Since your matrix is typeof integer then you can change your C++'s argument to be IntegerMatrix.

rcpp_apply_integer<-cppFunction('IntegerMatrix apply_cumsum_col(IntegerMatrix m) {
    NumericMatrix g=clone(m);
    for (int j = 0; j < m.ncol(); ++j) {
        for (int i = 1; i < m.nrow(); ++i) {
            g(i, j) += g(i - 1, j);
        }
    }
    return g;
}');

This impoved the code about 2 times. Here is a benchmark:

microbenchmark::microbenchmark(R=apply(testMatrix, 2, cumsum),Rcpp=rcpp_apply(testMatrix),Rcpp_integer=rcpp_apply_integer(testMatrix), times=10)

Unit: microseconds
        expr      min       lq      mean    median       uq      max neval
           R 1552.217 1706.165 1770.1264 1740.0345 1897.884 1940.989    10
        Rcpp  502.900  523.838  637.7188  665.0605  699.134  743.471    10
Rcpp_integer  220.455  274.645  274.9327  275.8770  277.930  316.109    10



all.equal(rcpp_apply(testMatrix),rcpp_apply_integer(testMatrix))
[1] TRUE

If your matrix has large values then you have to use NumericMatrix.

answered Nov 16 '22 02:11

Manos Papadakis

Related questions
                            
                                Differences in centrality measures between igraph and tnet
                            
                                How to expand colour palette in ggplot2
                            
                                How does lmer (from the R package lme4) compute log likelihood?
                            
                                How to configure RStudio package build to work across multiple machines
                            
                                select multiple columns in pandas data frame with column index as sequential number
                            
                                R - Divide each value in matrix by maximum value of its row/column
                            
                                How can I compute statistics by decile groups in data.table
                            
                                How can I get The optimal cutoff point of the ROC in logistic regression as a number
                            
                                Fama Macbeth Regression in Python (Pandas or Statsmodels)
                            
                                Alternate geom_text position with hjust
                            
                                Is it possible to run a python script in R shiny
                            
                                multiple colors in axes titles in ggplot
                            
                                How to specify "low" and "high" and get two scales on two ends using scale_fill_gradient
                            
                                How can I POST a simple HTML form in R?
                            
                                How to retrieve informations about journals from ISI Web of Knowledge?
                            
                                Use rmarkdown/knitr to hold all code until the end
                            
                                How to collapse many records into one while removing NA values
                            
                                Drawing a Tangent to the Plot and Finding the X-Intercept using R
                            
                                using caret for survival analysis (random survival forest)
                            
                                ggplot legend list is larger than page

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With