In R, matrix multiplication is very optimized, i.e. is really just a call to BLAS/LAPACK. However, I'm surprised this very naive C++ code for matrix-vector multiplication seems reliably 30% faster.
library(Rcpp) # Simple C++ code for matrix multiplication mm_code = "NumericVector my_mm(NumericMatrix m, NumericVector v){ int nRow = m.rows(); int nCol = m.cols(); NumericVector ans(nRow); double v_j; for(int j = 0; j < nCol; j++){ v_j = v[j]; for(int i = 0; i < nRow; i++){ ans[i] += m(i,j) * v_j; } } return(ans); } " # Compiling my_mm = cppFunction(code = mm_code) # Simulating data to use nRow = 10^4 nCol = 10^4 m = matrix(rnorm(nRow * nCol), nrow = nRow) v = rnorm(nCol) system.time(my_ans <- my_mm(m, v)) #> user system elapsed #> 0.103 0.001 0.103 system.time(r_ans <- m %*% v) #> user system elapsed #> 0.154 0.001 0.154 # Double checking answer is correct max(abs(my_ans - r_ans)) #> [1] 0
Does base R's %*%
perform some type of data check that I'm skipping over?
EDIT:
After understanding what's going on (thanks SO!), it's worth noting that this is a worst case scenario for R's %*%
, i.e. matrix by vector. For example, @RalfStubner pointed out that using an RcppArmadillo implementation of a matrix-vector multiply is even faster than the naive implementation that I demonstrated, implying considerable faster than base R, but is virtually identical to base R's %*%
for matrix-matrix multiply (when both matrices are large and square):
arma_code <- "arma::mat arma_mm(const arma::mat& m, const arma::mat& m2) { return m * m2; };" arma_mm = cppFunction(code = arma_code, depends = "RcppArmadillo") nRow = 10^3 nCol = 10^3 mat1 = matrix(rnorm(nRow * nCol), nrow = nRow) mat2 = matrix(rnorm(nRow * nCol), nrow = nRow) system.time(arma_mm(mat1, mat2)) #> user system elapsed #> 0.798 0.008 0.814 system.time(mat1 %*% mat2) #> user system elapsed #> 0.807 0.005 0.822
So R's current (v3.5.0) %*%
is near optimal for matrix-matrix, but could be significantly sped up for matrix-vector if you're okay skipping the checking.
A quick glance in names.c
(here in particular) points you to do_matprod
, the C function that is called by %*%
and which is found in the file array.c
. (Interestingly, it turns out, that both crossprod
and tcrossprod
dispatch to that same function as well). Here is a link to the code of do_matprod
.
Scrolling through the function, you can see that it takes care of a number of things your naive implementation does not, including:
%*%
are of classes for which such methods have been provided. (That's what's happening in this portion of the function.) Near the end of the function, it dispatches to either of matprod
or or cmatprod
. Interestingly (to me at least), in the case of real matrices, if either matrix might contain NaN
or Inf
values, then matprod
dispatches (here) to a function called simple_matprod
which is about as simple and straightforward as your own. Otherwise, it dispatches to one of a couple of BLAS Fortran routines which, presumably are faster, if uniformly 'well-behaved' matrix elements can be guaranteed.
Josh's answer explains why R's matrix multiplication is not as fast as this naive approach. I was curious to see how much one could gain using RcppArmadillo. The code is simple enough:
arma_code <- "arma::vec arma_mm(const arma::mat& m, const arma::vec& v) { return m * v; };" arma_mm = cppFunction(code = arma_code, depends = "RcppArmadillo")
Benchmark:
> microbenchmark::microbenchmark(my_mm(m,v), m %*% v, arma_mm(m,v), times = 10) Unit: milliseconds expr min lq mean median uq max neval my_mm(m, v) 71.23347 75.22364 90.13766 96.88279 98.07348 98.50182 10 m %*% v 92.86398 95.58153 106.00601 111.61335 113.66167 116.09751 10 arma_mm(m, v) 41.13348 41.42314 41.89311 41.81979 42.39311 42.78396 10
So RcppArmadillo gives us nicer syntax and better performance.
Curiosity got the better of me. Here a solution for using BLAS directly:
blas_code = " NumericVector blas_mm(NumericMatrix m, NumericVector v){ int nRow = m.rows(); int nCol = m.cols(); NumericVector ans(nRow); char trans = 'N'; double one = 1.0, zero = 0.0; int ione = 1; F77_CALL(dgemv)(&trans, &nRow, &nCol, &one, m.begin(), &nRow, v.begin(), &ione, &zero, ans.begin(), &ione); return ans; }" blas_mm <- cppFunction(code = blas_code, includes = "#include <R_ext/BLAS.h>")
Benchmark:
Unit: milliseconds expr min lq mean median uq max neval my_mm(m, v) 72.61298 75.40050 89.75529 96.04413 96.59283 98.29938 10 m %*% v 95.08793 98.53650 109.52715 111.93729 112.89662 128.69572 10 arma_mm(m, v) 41.06718 41.70331 42.62366 42.47320 43.22625 45.19704 10 blas_mm(m, v) 41.58618 42.14718 42.89853 42.68584 43.39182 44.46577 10
Armadillo and BLAS (OpenBLAS in my case) are almost the same. And the BLAS code is what R does in the end as well. So 2/3 of what R does is error checking etc.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With