Why is this Rcpp code slower than byte compiled R?

Question

As the question title says, I'd like to know why the byte compiled R code (using compiler::cmpfun) is faster than equivalent Rcpp code for the following mathematical function:

func1 <- function(alpha, tau, rho, phi) {
     abs((alpha + 1)^(tau) * phi - rho * (1- (1 + alpha)^(tau))/(1 - (1 + alpha)))
}

Since this is a simple numerical operation, I would have expected Rcpp (funcCpp and funcCpp2) to be much faster than the byte compiled R (func1c and func2c), especially since R would have more overhead for storing (1+alpha)**tau or needs to recompute it. In fact computing this exponent two times seems faster than the memory allocation in R (func1c vs func2c), which seems especially counterintuitive, since n is large. My other guess is that maybe compiler::cmpfun is pulling off some magic, but I'd like to know if that is indeed the case.

So really, the two things I'd like to know are:

Why are funcCpp and funcCpp2 slower than func1c and func2c? (Rcpp slower than compiled R functions)
Why is funcCpp slower than func2? (Rcpp code slower than pure R)

FWIW, here's my C++ and R version data

user% g++ --version
Configured with: --prefix=/Library/Developer/CommandLineTools/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.0 (clang-700.0.72)
Target: x86_64-apple-darwin14.3.0
Thread model: posix

user% R --version
R version 3.2.2 (2015-08-14) -- "Fire Safety"
Copyright (C) 2015 The R Foundation for Statistical Computing
Platform: x86_64-apple-darwin14.5.0 (64-bit)

And here's the R and Rcpp code:

library(Rcpp)
library(rbenchmark)

func1 <- function(alpha, tau, rho, phi) {
    abs((1 + alpha)^(tau) * phi - rho * (1- (1 + alpha)^(tau))/(1 - (1 + alpha)))
}

func2 <- function(alpha, tau, rho, phi) {
    pval <- (alpha + 1)^(tau)
    abs( pval * phi - rho * (1- pval)/(1 - (1 + alpha)))
}

func1c <- compiler::cmpfun(func1)
func2c <- compiler::cmpfun(func2)

func3c <- Rcpp::cppFunction('
    double funcCpp(double alpha, int tau, double rho, double phi) {
        double pow_val = std::exp(tau * std::log(alpha + 1.0));
        double pAg = rho/alpha;
        return std::abs(pow_val * (phi -  pAg) + pAg);
    }')

func4c <- Rcpp::cppFunction('
    double funcCpp2(double alpha, int tau, double rho, double phi) {
        double pow_val = pow(alpha + 1.0, tau) ;
        double pAg = rho/alpha;
        return std::abs(pow_val * (phi -  pAg) + pAg);
    }')

res <- benchmark(
           func1(0.01, 200, 100, 1000000),
           func1c(0.01, 200, 100, 1000000),
           func2(0.01, 200, 100, 1000000),
           func2c(0.01, 200, 100, 1000000),
           func3c(0.01, 200, 100, 1000000),
           func4c(0.01, 200, 100, 1000000),
           funcCpp(0.01, 200, 100, 1000000),
           funcCpp2(0.01, 200, 100, 1000000),
           replications = 100000,
           order='relative',
           columns=c("test", "replications", "elapsed", "relative"))

And here's the output of rbenchmark:

                             test replications elapsed relative
   func1c(0.01, 200, 100, 1e+06)       100000   0.349    1.000
   func2c(0.01, 200, 100, 1e+06)       100000   0.372    1.066
 funcCpp2(0.01, 200, 100, 1e+06)       100000   0.483    1.384
   func4c(0.01, 200, 100, 1e+06)       100000   0.509    1.458
    func2(0.01, 200, 100, 1e+06)       100000   0.510    1.461
  funcCpp(0.01, 200, 100, 1e+06)       100000   0.524    1.501
   func3c(0.01, 200, 100, 1e+06)       100000   0.546    1.564
    func1(0.01, 200, 100, 1e+06)       100000   0.549    1.573K

Dirk Eddelbuettel · Accepted Answer

This is essentially an ill-posed question. When you posit

func1 <- function(alpha, tau, rho, phi) {
     abs((alpha + 1)^(tau) * phi - rho * (1- (1 + alpha)^(tau))/(1 - (1 + alpha)))
}

without even specifying what the arguments are (ie scalar? vector? big? small? memory overhead) then you may in the best case just get a small set of (base, efficient) function calls directly from the parsed expression.

And ever since we've had the byte compiler, which was since improved by Luke Tierney in subsequent R releases, we have known that it does algebraic expressions well.

Now, compiled C/C++ code does that well too -- but there will be overhead in calling the compiled coed and what you see here is that for "rtivial enough" problems, the overhead does not really get amortized.

So you end up with pretty much a draw. Not surprise as far as I can tell.

Why is this Rcpp code slower than byte compiled R?

Tags:

c++

performance

r

rcpp

numerical

lostinarandomforest

1 Answers

Dirk Eddelbuettel

Recent Activity

Donate For Us

Why is this Rcpp code slower than byte compiled R?

Tags:

c++

performance

r

rcpp

numerical

lostinarandomforest

1 Answers

Dirk Eddelbuettel

Related questions

Recent Activity

Donate For Us