Multithreading on numpy/pandas matrix multiplication?

Tags:

I really want to know how to utilize multi-core processing for matrix multiplication on numpy/pandas.

What I'm trying is here:

M = pd.DataFrame(...) # super high dimensional square matrix.
A = M.T.dot(M)

This takes huge processing time because of many sums of products, and I think it's straightforward to use multithreading for huge matrix multiplication. So, I was googling carefully, but I can't find how to do that on numpy/pandas. Do I need to write multi thread code manually with some python built-in threading library?

642

asked Apr 04 '14 01:04

Light Yagmi

2 Answers

In NumPy, multithreaded matrix multiplication can be achieved with a multithreaded implementation of BLAS, the Basic Linear Algebra Subroutines. You need to:

Have such a BLAS implementation; OpenBLAS, ATLAS and MKL all include multithreaded matrix multiplication.
Have a NumPy compiled to use such an implementation.
Make sure the matrices you're multiplying both have a dtype of float32 or float64 (and meet certain alignment restrictions; I recommend using NumPy 1.7.1 or later where these have been relaxed).

A few caveats apply:

Older versions of OpenBLAS, when compiled with GCC, runs into trouble in programs that use multiprocessing, which includes most applications that use joblib. In particular, they will hang. The reason is a bug (or lack of a feature) in GCC. A patch has been submitted but not included in the mainline sources yet.
The ATLAS packages you find in a typical Linux distro may or may not be compiled to use multithreading.

As for Pandas: I'm not sure how it does dot products. Convert to NumPy arrays and back to be sure.

177

answered Oct 12 '22 23:10

Fred Foo

First of all I would also propose to convert to bumpy arrays and use numpys dot function. If you have access to an MKL build which is more or less the fastest implementation at the moment, you should try to set the environment variable OMP_NUM_THREADS. This should activate the other cores of your system. On my MAC it seems to work properly. In addition I would try to use np.einsum which seems to be faster than np.dot

But pay attention! If you have compiled an multithreaded library that is using OpenMP for parallelisation (like MKL), you have to consider, that the "default gcc" on all apple systems is not gcc, it is Clang/LLVM and Clang ist not able to build with OpenMP support at the moment, except you use the OpenMP trunk which is still experimental. So you have to install the intel compiler or any other that supports OpenMP

answered Oct 13 '22 00:10

lemitech

Related questions
                            
                                BeautifulSoup not extracting all html (automatically deleting much of a page's html)
                            
                                Share SciPy Sparse Array Between Process Objects
                            
                                What tools should I use to profile Python code on window 7
                            
                                python: numpy: concatenation of named arrays
                            
                                File handling speed of python 3.3 compared to fortran 77
                            
                                ImportError: numpy.core.multiarray failed to import while using mod_wsgi
                            
                                Need a way to retrieve the current playing song from Zune and Windows Media Player with Python
                            
                                Running Django test with setup.py test and tox
                            
                                How to use python Mock side_effect to act as a Class method in unit test
                            
                                F2PY - Access module parameter from subroutine
                            
                                How to find the highest number less than target value in a list?
                            
                                Using boolean indexing for row and column MultiIndex in Pandas
                            
                                Python-iptables how to optimize code
                            
                                Using multiple features with scikit-learn
                            
                                Scipy expit: Unexpected behavour. NaNs
                            
                                Create pdf with tooltips in python
                            
                                Python generator send: don't yield a new value after a send
                            
                                How do I make Python folding in vim not visually ruin the whitespace?
                            
                                Numpy broadcasting sliced arrays and vectors
                            
                                Will passing open() as json.load() parameter leave the file handle open?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Multithreading on numpy/pandas matrix multiplication?

Tags:

python

pandas

multithreading

matrix

numpy

Light Yagmi

People also ask

2 Answers

Fred Foo

lemitech

Recent Activity

Donate For Us