Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Multithreading on numpy/pandas matrix multiplication?

I really want to know how to utilize multi-core processing for matrix multiplication on numpy/pandas.

What I'm trying is here:

M = pd.DataFrame(...) # super high dimensional square matrix.
A = M.T.dot(M) 

This takes huge processing time because of many sums of products, and I think it's straightforward to use multithreading for huge matrix multiplication. So, I was googling carefully, but I can't find how to do that on numpy/pandas. Do I need to write multi thread code manually with some python built-in threading library?

like image 642
Light Yagmi Avatar asked Apr 04 '14 01:04

Light Yagmi


People also ask

Does NumPy use multithreading?

The numpy library uses multithreading by default, and so parallelizing a python function that uses numpy may create a huge number of threads. If the number of running threads exceeds the number of cores this could bottleneck important system processes on our compute nodes.

Is Matmul faster than dot?

matmul and both outperform np. dot . Also note, as explained in the docs, np.


2 Answers

In NumPy, multithreaded matrix multiplication can be achieved with a multithreaded implementation of BLAS, the Basic Linear Algebra Subroutines. You need to:

  1. Have such a BLAS implementation; OpenBLAS, ATLAS and MKL all include multithreaded matrix multiplication.
  2. Have a NumPy compiled to use such an implementation.
  3. Make sure the matrices you're multiplying both have a dtype of float32 or float64 (and meet certain alignment restrictions; I recommend using NumPy 1.7.1 or later where these have been relaxed).

A few caveats apply:

  • Older versions of OpenBLAS, when compiled with GCC, runs into trouble in programs that use multiprocessing, which includes most applications that use joblib. In particular, they will hang. The reason is a bug (or lack of a feature) in GCC. A patch has been submitted but not included in the mainline sources yet.
  • The ATLAS packages you find in a typical Linux distro may or may not be compiled to use multithreading.

As for Pandas: I'm not sure how it does dot products. Convert to NumPy arrays and back to be sure.

like image 177
Fred Foo Avatar answered Oct 12 '22 23:10

Fred Foo


First of all I would also propose to convert to bumpy arrays and use numpys dot function. If you have access to an MKL build which is more or less the fastest implementation at the moment, you should try to set the environment variable OMP_NUM_THREADS. This should activate the other cores of your system. On my MAC it seems to work properly. In addition I would try to use np.einsum which seems to be faster than np.dot

But pay attention! If you have compiled an multithreaded library that is using OpenMP for parallelisation (like MKL), you have to consider, that the "default gcc" on all apple systems is not gcc, it is Clang/LLVM and Clang ist not able to build with OpenMP support at the moment, except you use the OpenMP trunk which is still experimental. So you have to install the intel compiler or any other that supports OpenMP

like image 26
lemitech Avatar answered Oct 13 '22 00:10

lemitech