Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can Rcpp replace unif function in R?

Tags:

c++

r

rcpp

I have just started using the Rcpp package in R, my learning is inspired by the Advanced R course by Hadley Wickham.

Within R studio I have the following .cpp file. The question is more general but this example helps.

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
NumericVector runifC(int n, double min=0, double max=1) {
  NumericVector out(n);

  for(int i = 0; i < n; ++i) {
    out[i] = min + ((double) rand() / (RAND_MAX)) * (max - min);
  }
  return out;
}

/*** R
library(microbenchmark)
microbenchmark(
  'R unif-1'      = runif(1),
  'C++ unif-1'    = runifC(1),
  'R unif-100'    = runif(100),
  'C++ unif-100'  = runifC(100),
  'R unif-1000'   = runif(1000),
  'C++ unif-1000' = runifC(1000),
  'R unif-100000'   = runif(100000),
  'C++ unif-100000' = runifC(100000)
)
*/

When I source/save the file it shows me the performance output.

Unit: nanoseconds
             expr     min        lq       mean    median        uq     max neval
         R unif-1    2061    2644.5    4000.71    3456.0    4297.0   15402   100
       C++ unif-1     710    1190.0    1815.11    1685.0    2168.5    5776   100
       R unif-100    4717    5566.5    6794.14    6563.0    7435.5   16600   100
     C++ unif-100    1450    1997.5    2663.29    2591.5    3107.0    5307   100
      R unif-1000   28210   29584.5   31310.54   30380.0   31599.0   52879   100
    C++ unif-1000    8292    8951.0   10113.78    9462.5   10121.5   25099   100
    R unif-100000 2642581 2975117.0 3104580.62 3030938.5 3119489.0 5435046   100
  C++ unif-100000  699833  990924.0 1058855.49 1034430.5 1075078.0 1530351   100

I would expect that runif would be a very optimised function but the C++ code runs much more efficiently. I might be naive here, but if there is such a difference in performance then why aren't all applicable R functions rewritten in C++?

It seems so obvious that there are a lot of improvements possible that I feel as if I am missing a huge reason of why not all R functions can be blindly copied to C++ for performance.

edit: for this example it has been shown that the C++ implementation of rand() is slightly flawed. the performance gap that I noticed most used the rand() function. performance of other functions doesn't seem as drastic so i changed the name of the question.

like image 530
cantdutchthis Avatar asked Oct 20 '14 13:10

cantdutchthis


2 Answers

Please DO NOT USE rand(). Doing so will kick your package off CRAN too should you submit it.

See eg this C++ reference page for a warning:

Notes

There are no guarantees as to the quality of the random sequence produced. In the past, some implementations of rand() have had serious shortcomings in the randomness, distribution and period of the sequence produced (in one well-known example, the low-order bit simply alternated between 1 and 0 between calls).

If you are interested in alternate random number generators and timing, the Rcpp Gallery.

In general, use the generators provided by R which are of excellent statistical quality, and offered in both scalar and vectorised form ("Rcpp Sugar") by Rcpp.

like image 183
Dirk Eddelbuettel Avatar answered Sep 17 '22 22:09

Dirk Eddelbuettel


As of R-3.1.1, runif uses the .External interface, which copies its arguments. Luke Tierney changed this to use the .Call interface in R-devel in revision 66110. The .Call interface does not copy its arguments. Rcpp uses the .Call interface.


Your C++ code is still faster under R-devel (using the .Call interface). This is likely because of differences in the random number generator being used. Also, R's functions will generally have more checks than whatever specialized code you write; and those checks take time.

like image 21
Joshua Ulrich Avatar answered Sep 18 '22 22:09

Joshua Ulrich