Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Should I prefer Rcpp::NumericVector over std::vector?

Tags:

c++

rcpp

Is there any reason why I should prefer Rcpp::NumericVector over std::vector<double>?

For example, the two functions below

// [[Rcpp::export]]
Rcpp::NumericVector foo(const Rcpp::NumericVector& x) {
  Rcpp::NumericVector tmp(x.length());
  for (int i = 0; i < x.length(); i++)
    tmp[i] = x[i] + 1.0;
  return tmp;
}

// [[Rcpp::export]]
std::vector<double> bar(const std::vector<double>& x) {
  std::vector<double> tmp(x.size());
  for (int i = 0; i < x.size(); i++)
    tmp[i] = x[i] + 1.0;
  return tmp;
}

Are equivalent when considering their working and benchmarked performance. I understand that Rcpp offers sugar and vectorized operations, but if it is only about taking R's vector as input and returning vector as output, then would there be any difference which one of those I use? Can using std::vector<double> lead to any possible problems when interacting with R?

like image 922
Tim Avatar asked Jan 11 '17 22:01

Tim


2 Answers

Are equivalent when considering their working and benchmarked performance.

  1. I doubt that the benchmarks are accurate because going from a SEXP to std::vector<double> requires a deep copy from one data structure to another. (And as I was typing this, @DirkEddelbuettel ran a microbenchmark.)
  2. The markup of the Rcpp object (e.g. const Rcpp::NumericVector& x) is just visual sugar. By default, the object given is a pointer and as such can easily have a ripple modification effect (see below). Thus, there is no true match that exists with const std::vector<double>& x that effectively "locks" and "passes a references".

Can using std::vector<double> lead to any possible problems when interacting with R?

In short, no. The only penalty that is paid is the transference between objects.

The gain over this transference is the fact that modifying a value of a NumericVector that is assigned to another NumericVector will not cause a domino update. In essence, each std::vector<T> is a direct copy of the other. Therefore, the following couldn't happen:

#include<Rcpp.h>

// [[Rcpp::export]]
void test_copy(){
    NumericVector A = NumericVector::create(1, 2, 3);
    NumericVector B = A;

    Rcout << "Before: " << std::endl << "A: " << A << std::endl << "B: " << B << std::endl; 

    A[1] = 5; // 2 -> 5

    Rcout << "After: " << std::endl << "A: " << A << std::endl << "B: " << B << std::endl; 
}

Gives:

test_copy()
# Before: 
# A: 1 2 3
# B: 1 2 3
# After: 
# A: 1 5 3
# B: 1 5 3

Is there any reason why I should prefer Rcpp::NumericVector over std::vector<double>?

There are a few reasons:

  1. As hinted previously, using Rcpp::NumericVector avoids a deep copy to and fro the C++ std::vector<T>.
  2. You gain access to the sugar functions.
  3. Ability to 'mark up' Rcpp object in C++ (e.g. adding attributes via .attr())
like image 170
coatless Avatar answered Nov 06 '22 05:11

coatless


"If unsure, just time it."

All it takes is to add these few lines to the file you already had:

/*** R
library(microbenchmark)
x <- 1.0* 1:1e7   # make sure it is numeric
microbenchmark(foo(x), bar(x), times=100L)
*/

Then just calling sourceCpp("...yourfile...") generates the following result (plus warnings on signed/unsigned comparisons):

R> library(microbenchmark)

R> x <- 1.0* 1:1e7   # make sure it is numeric

R> microbenchmark(foo(x), bar(x), times=100L)
Unit: milliseconds
   expr     min      lq    mean  median      uq      max neval cld
 foo(x) 31.6496 31.7396 32.3967 31.7806 31.9186  54.3499   100  a 
 bar(x) 50.9229 51.0602 53.5471 51.1811 51.5200 147.4450   100   b
R> 

Your bar() solution needs to make a copy to create a R object in the R memory pool. foo() does not. That matters for large vectors that you run over many times. Here we see a ratio of close of about 1.8.

In practice, it may not matter if you prefer one coding style over the other etc pp.

like image 42
Dirk Eddelbuettel Avatar answered Nov 06 '22 06:11

Dirk Eddelbuettel