Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fast LAPACK/BLAS for matrix multiplication

Tags:

c++

armadillo

I'm exploring the Armadillo C++ library for linear algebra at the moment. As far as I understood it uses LAPACK/BLAS library for basic matrix operations (e.g. matrix multiplication). As a Windows user I downloaded LAPACK/BLAS from here: http://icl.cs.utk.edu/lapack-for-windows/lapack/#running. The problem is that matrix multiplications are very slow comparing to Matlab or even R. For example, Matlab multiplies two 1000x1000 matrices in ~0.15 seconds on my computer, R needs ~1 second, while C++/Armadillo/LAPACK/BLAS needs more than 10 seconds for that.

So, Matlab is based on highly optimized libraries for linear algebra. My question is if there exists a faster LAPACK/BLAS brary to use from Armadillo? Alternatively, is there a way to extract Matlab linear algebra libraries somehow and use them in C++?

like image 834
Kasablanca Avatar asked Jul 14 '13 11:07

Kasablanca


2 Answers

LAPACK doesn't do matrix multiplication. It's BLAS that provides matrix multiplication.

If you have a 64 bit operating system, I recommend to first try a 64 bit version of BLAS. This will get you an immediate doubling of performance.

Secondly, have a look at a high-performance implementation of BLAS, such as OpenBLAS. OpenBLAS uses both vectorisation and parallelisation (ie. multi-core). It is a free (no cost) open source project.

Matlab internally uses the Intel MKL library, which you can also use with the Armadillo library. Intel MKL is closed source, but is free for non-commercial use. Note that OpenBLAS can obtain matrix multiplication performance that is on par or better than Intel MKL.

Note that high performance linear algebra is generally easier to accomplish on Linux and Mac OS X than on Windows.

like image 196
mtall Avatar answered Sep 28 '22 00:09

mtall


Adding to what has already been said, you should also use a high level of optimization:

  1. Be sure to use either the O2 or the O3 compiler flag.

  2. Link to the above mentioned high performance (and possibly multi-threaded) BLAS libraries. AFAIK MKL is only freely available for Unix platforms though, if you're using a Linux box like cygwin inside windows, this should be OK then I guess. OpenBLAS is also multi-threaded.

  3. In many libraries, setting the symbol NDEBUG (e.g. passing the compiler flag -DNDEBUG) turns off costly range checking and assertions. Armadillo has its own symbol, called ARMA_NO_DEBUG, which you can either set manually, or you can edit the config.hpp header file (located in the armadillo include directory) and uncomment the corresponding line. I am guessing since you were able to turn on external BLAS usage in armadillo, you should be familiar with this config file anyways...

I did a quick comparison between armadillo/MKL_BLAS and Matlab on my intel core-i7 workstation. For the C++ exe I used -O3, MKL BLAS and had ARMA_NO_DEBUG defined. I multiplied 1000x1000 random matrices 100 times and averaged the multiplication times. The C++ implementation was roughly 1.5 times faster than matlab.

Hope this helps

like image 21
Darkdragon84 Avatar answered Sep 27 '22 22:09

Darkdragon84