I'm trying to understand a difference in performance between a function written in RcppArmadillo and one written in a standalone C++ program using the Armadillo library. For example, consider the following simple function that computes the coefficients for a linear model using the traditional textbook formula.
// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
using namespace Rcpp;
using namespace arma;
// [[Rcpp::export]]
void simpleLm(NumericMatrix Xr, NumericMatrix yr) {
int n = Xr.nrow(), k = Xr.ncol();
mat X(Xr.begin(), n, k, false);
colvec y(yr.begin(), yr.nrow(), false);
colvec coef = inv(X.t()*X)*X.t()*y;
}
This takes about 6 seconds to run with a 1000000x100
matrix for X
. Some timings in the code (not shown) indicated that all the time is spent on the coef
calculation.
X <- matrix(rnorm(1000000*100), ncol=100)
y <- matrix(rep(1, 1000000))
system.time(simpleLm(X,y))
user system elapsed
6.028 0.009 6.040
Now consider a very similar function written in C++ that is then compiled with g++
.
#include <iostream>
#include <armadillo>
#include <chrono>
#include <cstdlib>
using namespace std;
using namespace arma;
int main(int argc, char **argv) {
int n = 1000000;
mat X = randu<mat>(n,100);
vec y = ones<vec>(n);
chrono::steady_clock::time_point start = chrono::steady_clock::now();
colvec coef = inv(X.t()*X)*X.t()*y;
chrono::steady_clock::time_point end = chrono::steady_clock::now();
chrono::duration<double, milli> diff = end - start;
cout << diff.count() << endl;
return 0;
}
Here the calculation of the coef
variable only takes about 0.5 seconds, or only 1/12th the time as when done with RcppArmadillo.
I'm using Mac OS X 10.9.2 with R 3.1.0, Rcpp 0.11.1 and RcppArmadillo 0.4.200.0. I compiled the Rcpp example using the sourceCpp function. The standalone C++ example uses Armadillo 4.200.0, and I also installed the Fortran compiler for Mac using Homebrew (brew install gfortran
).
Quick guess: your native program uses accelerated BLAS, you R build does not.
The actual "matrix math" is farmed out by Armadillo to the BLAS library. With RcppArmadillo, you get what R is built against. With a native program, maybe you use something else. It could be as simple as your program getting to use the Accelerate libraries whereas R doesn't -- I don't really know as I don't use OS X.
But to demonstrate, on my (i7, Linux) machine, times are near identical.
First, your program, unaltered:
edd@max:/tmp$ g++ -std=c++11 -O3 -o abiel abiel.cpp -larmadillo -llapack
edd@max:/tmp$ ./abiel
2454
edd@max:/tmp$
Second, your program wrapped into something R can call (see below):
R> library(Rcpp)
R> sourceCpp("/tmp/abielviaR.cpp")
R> abielDemo()
2354.41
[1] TRUE
R>
About the same.
The code of abielviaR.cpp
follows.
#include <RcppArmadillo.h>
#include <chrono>
using namespace std;
using namespace arma;
// [[Rcpp::plugins(cpp11)]]
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
bool abielDemo() {
int n = 1000000;
mat X = randu<mat>(n,100);
vec y = ones<vec>(n);
chrono::steady_clock::time_point start = chrono::steady_clock::now();
colvec coef = inv(X.t()*X)*X.t()*y;
chrono::steady_clock::time_point end = chrono::steady_clock::now();
chrono::duration<double, milli> diff = end - start;
Rcpp::Rcout << diff.count() << endl;
return true;
}
PS You really should not compute OLS via (X'X)^(-1) X though.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With