matrix by vector multiplication - R versus Matlab

Tags:

I observed that matrix by vector multiplication with a large matrix is significantly slower in R (version 3.6.1) than in Matlab (version 2019b), while both languages rely on the (same?) BLAS library. See below a minimal example:

In Matlab:

n=900; 
p=900; 
A=reshape(1:(n*p),[n,p]); 
x=ones(p,1); 
tic()
for id = 1:1000
  x = A*x; 
end
toc()

In R:

n=900
p=900
A=matrix(c(1:(n*p)),nrow=n,ncol=p)
x=rep(1,ncol(A))
t0 <- Sys.time()
for(iter in 1:1000){
  x = A%*%x
}
t1 <- Sys.time()
print(t1-t0)

I get a running execution time of roughly 0.05sec in Matlab versus 3.5sec in R using the same computer. Any idea of the reason for such difference?

Thanks.

[EDIT]: I add below a similar calculus in C (using the CBLAS library, compilation using gcc cblas_dgemv.c -lblas -o cblas_dgemv, where cblas_dgemv.c denotes the source file below). I get a running time of roughly 0.08s which is quite close to the running times obtained using Matlab (0.05s). I am still trying to figure out the reason of this huge running time in R (3.5s).

#include <stdlib.h>
#include <stdio.h>
#include <sys/time.h>
#include "cblas.h"

int main(int argc, char **argv)
{
  int m=900,adr;
  double *A,*x,*y;
  struct timeval t0,t1;

  /* memory allocation and initialization */
  A = (double*)malloc(m*m*sizeof(double)); 
  x = (double*)malloc(m*sizeof(double));  
  y = (double*)malloc(m*sizeof(double));  
  for(adr=0;adr<m;adr++) x[adr] = 1.; 
  for(adr=0;adr<m*m;adr++) A[adr] = adr;

  /* main loop */
  gettimeofday(&t0, NULL);
  for(adr=0;adr<1000;adr++)
    cblas_dgemv(CblasColMajor,CblasNoTrans,m,m,1.,A,m,x,1,0.,y,1);
  gettimeofday(&t1, NULL);
  printf("elapsed time = %.2e seconds\n",(double)(t1.tv_usec-t0.tv_usec)/1000000. + (double)(t1.tv_sec-t0.tv_sec));

  /* free memory */
  free(A);
  free(x);
  free(y); 

  return EXIT_SUCCESS;
}

Notice that I was not able to set y=x in the cblas_dgemv routine. Thus, this C calculus is slightly different from that done in R and Matlab codes above. However the compilation was done without optimization flag (no option -O3) and I checked that the matrix-vector product was indeed called at each iteration of the loop (performing 10x more iterations leads to a 10x longer running time).

340

asked Feb 14 '20 20:02

RémyA

2 Answers

Here's something kind of shocking:

The precompiled R distribution that is downloaded from CRAN makes use of the reference BLAS/LAPACK implementation for linear algebra operations

The "reference BLAS" is the non-optimized, non-accelerated BLAS, unlike OpenBLAS or Intel MKL. Matlab uses MKL, which is accelerated.

This seems to be confirmed by sessionInfo() in my R 3.6.0 on macOS:

> sessionInfo()
R version 3.6.0 (2019-04-26)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.6

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib

If I'm reading this right, that means that by default, R uses a slow BLAS, and if you want it to go fast, you need to do some configuration to get it to use a fast BLAS instead.

This is a little surprising to me. As I understand it, the reference BLAS is generally primarily used for testing and development, not for "actual work".

I get about the same timings as you in R 3.6 vs Matlab R2019b on macOS 10.14: 0.04 seconds in Matlab, 4.5 seconds in R. I think that's consistent with R using a non-accelerated BLAS.

answered Oct 16 '22 16:10

Andrew Janke

I used Rcpp to interface a C++ implementation of the matrix-vector product with R.

Source C++ file ('cblas_dgemv2.cpp'): performes 'niter' times the product y = A*x

#include <cblas-openblas.h>
#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
Rcpp::NumericVector cblas_dgemv2(Rcpp::NumericMatrix A, Rcpp::NumericVector x, int niter)
{
  int m=A.ncol(),iter;
  Rcpp::NumericVector y(m);
  for(iter=0;iter<niter;iter++)
    cblas_dgemv(CblasColMajor,CblasNoTrans,m,m,1.,A.begin(),m,x.begin(),1,0.,y.begin(),1);
  return y; 
}

Then I perform two experiments using the R code below:

Experiment 1: from R, call y=cblas_dgmev2(A,x,1000) to perform 1000 times in C++ the computation of the product y=Ax*;
Experiment 2: from R, call 1000 times y=cblas_dgemv2(A,x,1), each call performs in C++ the product y=A*x.

# compile cblas_dgemv2 (you may need to update the path to the library)
PKG_LIBS <- '/usr/lib/x86_64-linux-gnu/libopenblas.a' 
PKG_CPPFLAGS <- '-I/usr/include/x86_64-linux-gnu'
Sys.setenv(PKG_LIBS = PKG_LIBS , PKG_CPPFLAGS = PKG_CPPFLAGS) 
Rcpp::sourceCpp('cblas_dgemv2.cpp', rebuild = TRUE)

# create A and x 
n=900
A=matrix(c(1:(n*n)),nrow=n,ncol=n)
x=rep(1,n)

# Experiment 1: call 1 time cblas_dgemv2 (with 1000 iterations in C++)
t0 <- Sys.time()
y=cblas_dgemv2(A,x,1000) # perform 1000 times the computation y = A%*%x 
t1 <- Sys.time()
print(t1-t0)

# Experiment 2: call 1000 times cblas_dgemv2  
t0 <- Sys.time()
for(iter in 1:1000){
  y=cblas_dgemv2(A,x,1) # perform 1 times the computation y = A%*%x 
}
t1 <- Sys.time()
print(t1-t0)

I get a running time of 0.08s for the first experiment versus 4.8s for the second experiment.

My conclusion: the bottleneck in terms of the running times comes from the data transferts between R and C++, not from the computation of the matrix-vector product itself. Surprising, isn't it?

answered Oct 16 '22 17:10

RémyA

Related questions
                            
                                Adding Regression Line Equation and R2 on SEPARATE LINES graph
                            
                                ggplot - Create a border overlay on top of map
                            
                                loess regression on each group with dplyr::group_by()
                            
                                Is there a rep() function in a package in Julia?
                            
                                How to print raw values in Rcpp
                            
                                How to add horizontal scroll bar for a ggplot plot in RMarkdown html
                            
                                R Shiny map search input box
                            
                                Convert list of values to a list of list indexes for each value
                            
                                What distinguishes dplyr::pull from purrr::pluck and magrittr::extract2?
                            
                                Assigning categorical values to NAs randomly or proportionally
                            
                                R Shiny: how to use fontawesome pro version with the icon function?
                            
                                how to unquote (!!) inside `map` inside `mutate`
                            
                                Get Xaringan incremental animations to print to PDF?
                            
                                Cluster groups of 1s in a binary matrix
                            
                                Getting Visual Studio Code to (auto-)format R-Code
                            
                                How to get pmax over multiple variables with dplyr? [duplicate]
                            
                                Making a a four quadrant proportional area chart in R
                            
                                How to efficiently find last preceding row with nonzero value in R data.table
                            
                                dbplyr, dplyr, and functions with no SQL equivalents [eg `slice()`]
                            
                                how to insert valuebox inside navbarpage layout?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

matrix by vector multiplication - R versus Matlab

Tags:

performance

r

matlab

RémyA

People also ask

2 Answers

Andrew Janke

RémyA

Recent Activity

Donate For Us