Is there any reason why I should prefer Rcpp::NumericVector
over std::vector<double>
?
For example, the two functions below
// [[Rcpp::export]]
Rcpp::NumericVector foo(const Rcpp::NumericVector& x) {
Rcpp::NumericVector tmp(x.length());
for (int i = 0; i < x.length(); i++)
tmp[i] = x[i] + 1.0;
return tmp;
}
// [[Rcpp::export]]
std::vector<double> bar(const std::vector<double>& x) {
std::vector<double> tmp(x.size());
for (int i = 0; i < x.size(); i++)
tmp[i] = x[i] + 1.0;
return tmp;
}
Are equivalent when considering their working and benchmarked performance. I understand that Rcpp offers sugar and vectorized operations, but if it is only about taking R's vector as input and returning vector as output, then would there be any difference which one of those I use? Can using std::vector<double>
lead to any possible problems when interacting with R?
Are equivalent when considering their working and benchmarked performance.
SEXP
to std::vector<double>
requires a deep copy from one data structure to another. (And as I was typing this, @DirkEddelbuettel ran a microbenchmark.)const Rcpp::NumericVector& x
) is just visual sugar. By default, the object given is a pointer and as such can easily have a ripple modification effect (see below). Thus, there is no true match that exists with const std::vector<double>& x
that effectively "locks" and "passes a references".Can using
std::vector<double>
lead to any possible problems when interacting with R?
In short, no. The only penalty that is paid is the transference between objects.
The gain over this transference is the fact that modifying a value of a NumericVector
that is assigned to another NumericVector
will not cause a domino update. In essence, each std::vector<T>
is a direct copy of the other. Therefore, the following couldn't happen:
#include<Rcpp.h>
// [[Rcpp::export]]
void test_copy(){
NumericVector A = NumericVector::create(1, 2, 3);
NumericVector B = A;
Rcout << "Before: " << std::endl << "A: " << A << std::endl << "B: " << B << std::endl;
A[1] = 5; // 2 -> 5
Rcout << "After: " << std::endl << "A: " << A << std::endl << "B: " << B << std::endl;
}
Gives:
test_copy()
# Before:
# A: 1 2 3
# B: 1 2 3
# After:
# A: 1 5 3
# B: 1 5 3
Is there any reason why I should prefer
Rcpp::NumericVector
overstd::vector<double>
?
There are a few reasons:
Rcpp::NumericVector
avoids a deep copy to and fro the C++ std::vector<T>
..attr()
)"If unsure, just time it."
All it takes is to add these few lines to the file you already had:
/*** R
library(microbenchmark)
x <- 1.0* 1:1e7 # make sure it is numeric
microbenchmark(foo(x), bar(x), times=100L)
*/
Then just calling sourceCpp("...yourfile...")
generates the following result (plus warnings on signed/unsigned comparisons):
R> library(microbenchmark)
R> x <- 1.0* 1:1e7 # make sure it is numeric
R> microbenchmark(foo(x), bar(x), times=100L)
Unit: milliseconds
expr min lq mean median uq max neval cld
foo(x) 31.6496 31.7396 32.3967 31.7806 31.9186 54.3499 100 a
bar(x) 50.9229 51.0602 53.5471 51.1811 51.5200 147.4450 100 b
R>
Your bar()
solution needs to make a copy to create a R object in the R memory pool. foo()
does not. That matters for large vectors that you run over many times. Here we see a ratio of close of about 1.8.
In practice, it may not matter if you prefer one coding style over the other etc pp.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With