Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

matrix by vector multiplication - R versus Matlab

I observed that matrix by vector multiplication with a large matrix is significantly slower in R (version 3.6.1) than in Matlab (version 2019b), while both languages rely on the (same?) BLAS library. See below a minimal example:

In Matlab:

n=900; 
p=900; 
A=reshape(1:(n*p),[n,p]); 
x=ones(p,1); 
tic()
for id = 1:1000
  x = A*x; 
end
toc()

In R:

n=900
p=900
A=matrix(c(1:(n*p)),nrow=n,ncol=p)
x=rep(1,ncol(A))
t0 <- Sys.time()
for(iter in 1:1000){
  x = A%*%x
}
t1 <- Sys.time()
print(t1-t0)

I get a running execution time of roughly 0.05sec in Matlab versus 3.5sec in R using the same computer. Any idea of the reason for such difference?

Thanks.

[EDIT]: I add below a similar calculus in C (using the CBLAS library, compilation using gcc cblas_dgemv.c -lblas -o cblas_dgemv, where cblas_dgemv.c denotes the source file below). I get a running time of roughly 0.08s which is quite close to the running times obtained using Matlab (0.05s). I am still trying to figure out the reason of this huge running time in R (3.5s).

#include <stdlib.h>
#include <stdio.h>
#include <sys/time.h>
#include "cblas.h"

int main(int argc, char **argv)
{
  int m=900,adr;
  double *A,*x,*y;
  struct timeval t0,t1;

  /* memory allocation and initialization */
  A = (double*)malloc(m*m*sizeof(double)); 
  x = (double*)malloc(m*sizeof(double));  
  y = (double*)malloc(m*sizeof(double));  
  for(adr=0;adr<m;adr++) x[adr] = 1.; 
  for(adr=0;adr<m*m;adr++) A[adr] = adr;

  /* main loop */
  gettimeofday(&t0, NULL);
  for(adr=0;adr<1000;adr++)
    cblas_dgemv(CblasColMajor,CblasNoTrans,m,m,1.,A,m,x,1,0.,y,1);
  gettimeofday(&t1, NULL);
  printf("elapsed time = %.2e seconds\n",(double)(t1.tv_usec-t0.tv_usec)/1000000. + (double)(t1.tv_sec-t0.tv_sec));

  /* free memory */
  free(A);
  free(x);
  free(y); 

  return EXIT_SUCCESS;
}

Notice that I was not able to set y=x in the cblas_dgemv routine. Thus, this C calculus is slightly different from that done in R and Matlab codes above. However the compilation was done without optimization flag (no option -O3) and I checked that the matrix-vector product was indeed called at each iteration of the loop (performing 10x more iterations leads to a 10x longer running time).

like image 340
RémyA Avatar asked Feb 14 '20 20:02

RémyA


People also ask

Why is MATLAB so fast in matrix multiplication?

Because MATLAB is a programming language at first developed for numerical linear algebra (matrix manipulations), which has libraries especially developed for matrix multiplications.

What is the difference between vector and matrix in MATLAB?

A vector is a 1-dimensional matrix, either a vertical vector (N × 1) or horizontal vector (1 × N). Vectors are a subclass of matrices, so every vector is a matrix. xL and xU are horizontal (1 × N) vectors and therefore they are also matrices. ib.

What is the difference between array multiplication and matrix multiplication in MATLAB?

Matrix operations follow the rules of linear algebra. By contrast, array operations execute element by element operations and support multidimensional arrays. The period character ( . ) distinguishes the array operations from the matrix operations.

How to multiply a matrix by a vector in R?

A matrix is a 2-dimensional structure whereas a vector is a one-dimensional structure. In this article, we are going to multiply the given matrix by the given vector using R Programming Language. Multiplication between the two occurs when vector elements are multiplied with matrix elements column-wise. Display result.

How to multiply vectors to a matrix in Python?

we can use sweep () method to multiply vectors to a matrix. sweep () function is used to apply the operation “+ or – or ‘*’ or ‘/’ ” to the row or column in the given matrix.

How do you make a 4x3 matrix in MATLAB?

Create a row vector a and a column vector b, then multiply them. The 1-by-3 row vector and 4-by-1 column vector combine to produce a 4-by-3 matrix.

What is the difference between matrix multiplication and Mtimes?

Matrix multiplication is not universally commutative for nonscalar inputs. That is, A*B is typically not equal to B*A. If at least one input is scalar, then A*B is equivalent to A.*B and is commutative. C = mtimes (A,B) is an alternative way to execute A*B, but is rarely used. It enables operator overloading for classes.


2 Answers

Here's something kind of shocking:

The precompiled R distribution that is downloaded from CRAN makes use of the reference BLAS/LAPACK implementation for linear algebra operations

The "reference BLAS" is the non-optimized, non-accelerated BLAS, unlike OpenBLAS or Intel MKL. Matlab uses MKL, which is accelerated.

This seems to be confirmed by sessionInfo() in my R 3.6.0 on macOS:

> sessionInfo()
R version 3.6.0 (2019-04-26)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.6

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib

If I'm reading this right, that means that by default, R uses a slow BLAS, and if you want it to go fast, you need to do some configuration to get it to use a fast BLAS instead.

This is a little surprising to me. As I understand it, the reference BLAS is generally primarily used for testing and development, not for "actual work".

I get about the same timings as you in R 3.6 vs Matlab R2019b on macOS 10.14: 0.04 seconds in Matlab, 4.5 seconds in R. I think that's consistent with R using a non-accelerated BLAS.

like image 81
Andrew Janke Avatar answered Oct 16 '22 16:10

Andrew Janke


I used Rcpp to interface a C++ implementation of the matrix-vector product with R.

Source C++ file ('cblas_dgemv2.cpp'): performes 'niter' times the product y = A*x

#include <cblas-openblas.h>
#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
Rcpp::NumericVector cblas_dgemv2(Rcpp::NumericMatrix A, Rcpp::NumericVector x, int niter)
{
  int m=A.ncol(),iter;
  Rcpp::NumericVector y(m);
  for(iter=0;iter<niter;iter++)
    cblas_dgemv(CblasColMajor,CblasNoTrans,m,m,1.,A.begin(),m,x.begin(),1,0.,y.begin(),1);
  return y; 
}

Then I perform two experiments using the R code below:

  • Experiment 1: from R, call y=cblas_dgmev2(A,x,1000) to perform 1000 times in C++ the computation of the product y=Ax*;
  • Experiment 2: from R, call 1000 times y=cblas_dgemv2(A,x,1), each call performs in C++ the product y=A*x.
# compile cblas_dgemv2 (you may need to update the path to the library)
PKG_LIBS <- '/usr/lib/x86_64-linux-gnu/libopenblas.a' 
PKG_CPPFLAGS <- '-I/usr/include/x86_64-linux-gnu'
Sys.setenv(PKG_LIBS = PKG_LIBS , PKG_CPPFLAGS = PKG_CPPFLAGS) 
Rcpp::sourceCpp('cblas_dgemv2.cpp', rebuild = TRUE)

# create A and x 
n=900
A=matrix(c(1:(n*n)),nrow=n,ncol=n)
x=rep(1,n)

# Experiment 1: call 1 time cblas_dgemv2 (with 1000 iterations in C++)
t0 <- Sys.time()
y=cblas_dgemv2(A,x,1000) # perform 1000 times the computation y = A%*%x 
t1 <- Sys.time()
print(t1-t0)

# Experiment 2: call 1000 times cblas_dgemv2  
t0 <- Sys.time()
for(iter in 1:1000){
  y=cblas_dgemv2(A,x,1) # perform 1 times the computation y = A%*%x 
}
t1 <- Sys.time()
print(t1-t0)

I get a running time of 0.08s for the first experiment versus 4.8s for the second experiment.

My conclusion: the bottleneck in terms of the running times comes from the data transferts between R and C++, not from the computation of the matrix-vector product itself. Surprising, isn't it?

like image 35
RémyA Avatar answered Oct 16 '22 17:10

RémyA