I am new to C++
programming (using Rcpp
for seamless integration into R
), and I would appreciate some advice on how to speed up some calculations.
Consider the following example:
testmat <- matrix(1:9, nrow=3)
testvec <- 1:3
testmat*testvec
# [,1] [,2] [,3]
#[1,] 1 4 7
#[2,] 4 10 16
#[3,] 9 18 27
Here, R
recycled testvec
so that, loosely speaking, testvec
"became" a matrix of the same dimensions as testmat
for the purpose of this multiplication. Then the Hadamard product is returned. I wish to implement this behavior using Rcpp
, that is I want that each element of the i
-th row in the matrix testmat
is multiplied with the i
-th element of the vector testvec
. My benchmarks tell me that my implementations are extremely slow, and I would appreciate advise on how to speed this up. Here my code:
First, using Eigen
:
#include <RcppEigen.h>
// [[Rcpp::depends(RcppEigen)]]
using namespace Rcpp;
using namespace Eigen;
// [[Rcpp::export]]
NumericMatrix E_matvecprod_elwise(NumericMatrix Xs, NumericVector ys){
Map<MatrixXd> X(as<Map<MatrixXd> >(Xs));
Map<VectorXd> y(as<Map<VectorXd> >(ys));
int k = X.cols();
int n = X.rows();
MatrixXd Y(n,k) ;
// here, I emulate R's recycling. I did not find an easier way of doing this. Any hint appreciated.
for(int i = 0; i < k; ++i) {
Y.col(i) = y;
}
MatrixXd out = X.cwiseProduct(Y);
return wrap(out);
}
Here my implementation using Armadillo
(adjusted to follow Dirk's example, see answer below):
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
using namespace Rcpp;
using namespace arma;
// [[Rcpp::export]]
arma::mat A_matvecprod_elwise(const arma::mat & X, const arma::vec & y){
int k = X.n_cols ;
arma::mat Y = repmat(y, 1, k) ; //
arma::mat out = X % Y;
return out;
}
Benchmarking these solutions using R, Eigen or Armadillo shows that both Eigen and Armadillo are about 2 times slower than R. Is there a way to speed these computations up or to get at least as fast as R? Are there more elegant ways of setting this up? Any advise is appreciated and welcome. (I also encourage tangential remarks about programming style in general as I am new to Rcpp / C++
.)
Here some reproducable benchmarks:
# for comparison, define R function:
R_matvecprod_elwise <- function(mat, vec) mat*vec
n <- 50000
k <- 50
X <- matrix(rnorm(n*k), nrow=n)
e <- rnorm(n)
benchmark(R_matvecprod_elwise(X, e), A2_matvecprod_elwise(X, e), E_matvecprod_elwise(X,e),
columns = c("test", "replications", "elapsed", "relative"), order = "relative", replications = 1000)
This yields
test replications elapsed relative
1 R_matvecprod_elwise(X, e) 1000 10.89 1.000
2 A_matvecprod_elwise(X, e) 1000 26.87 2.467
3 E_matvecprod_elwise(X, e) 1000 27.73 2.546
As you can see, my Rcpp
-solutions perform quite miserably. Any way to do it better?
If you want to speed up your calculations you will have to be a little careful about not making copies. This usually means sacrificing readability. Here is a version which makes no copies and modifies matrix X inplace.
// [[Rcpp::export]]
NumericMatrix Rcpp_matvecprod_elwise(NumericMatrix & X, NumericVector & y){
unsigned int ncol = X.ncol();
unsigned int nrow = X.nrow();
int counter = 0;
for (unsigned int j=0; j<ncol; j++) {
for (unsigned int i=0; i<nrow; i++) {
X[counter++] *= y[i];
}
}
return X;
}
Here is what I get on my machine
> library(microbenchmark)
> microbenchmark(R=R_matvecprod_elwise(X, e), Arma=A_matvecprod_elwise(X, e), Rcpp=Rcpp_matvecprod_elwise(X, e))
Unit: milliseconds
expr min lq median uq max neval
R 8.262845 9.386214 10.542599 11.53498 12.77650 100
Arma 18.852685 19.872929 22.782958 26.35522 83.93213 100
Rcpp 6.391219 6.640780 6.940111 7.32773 7.72021 100
> all.equal(R_matvecprod_elwise(X, e), Rcpp_matvecprod_elwise(X, e))
[1] TRUE
For starters, I'd write the Armadillo version (interface) as
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
using namespace Rcpp;
using namespace arma;
// [[Rcpp::export]]
arama::mat A_matvecprod_elwise(const arma::mat & X, const arma::vec & y){
int k = X.n_cols ;
arma::mat Y = repmat(y, 1, k) ; //
arma::mat out = X % Y;
return out;
}
as you're doing an additional conversion in and out (though the wrap()
gets added by the glue code). The const &
is notional (as you learned via your last question, a SEXP
is a pointer object that is lightweight to copy) but better style.
You didn't show your benchmark results so I can't comment on the effect of matrix size etc pp. I suspect you might get better answers on rcpp-devel than here. Your pick.
Edit: If you really want something cheap and fast, I would just do this:
// [[Rcpp::export]]
mat cheapHadamard(mat X, vec y) {
// should row dim of X versus length of Y here
for (unsigned int i=0; i<y.n_elem; i++) X.row(i) *= y(i);
return X;
}
which allocates no new memory and will hence be faster, and probably be competitive with R.
Test output:
R> cheapHadamard(testmat, testvec)
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 4 10 16
[3,] 9 18 27
R>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With