Weird bug in Pandas and Numpy regarding multithreading

Tags:

Most of the Numpy's function will enable multithreading by default.

for example, I work on a 8-cores intel cpu workstation, if I run a script

import numpy as np     x=np.random.random(1000000) for i in range(100000):     np.sqrt(x)

the linux top will show 800% cpu usage during running like enter image description here Which means numpy automatically detects that my workstation has 8 cores, and np.sqrt automatically use all 8 cores to accelerate computation.

However, I found a weird bug. If I run a script

import numpy as np import pandas as pd df=pd.DataFrame(np.random.random((10,10))) df+df x=np.random.random(1000000) for i in range(100000):     np.sqrt(x)

the cpu usage is 100%!!. enter image description here It means that if you plus two pandas DataFrame before running any numpy function, the auto multithreading feature of numpy is gone without any warning! This is absolutely not reasonable, why would Pandas dataFrame calculation affect Numpy threading setting? Is it a bug? How to work around this?

PS:

I dig further using Linux perf tool.

running first script shows

enter image description here

While running second script shows

enter image description here

So both script involves libmkl_vml_avx2.so, while the first script involves additional libiomp5.so which seems to be related to openMP.

And since vml means intel vector math library, so according to vml doc I guess at least below functions are all automatically multithreaded

enter image description here

846

asked Dec 22 '19 14:12

user15964

1 Answers

Pandas uses numexpr under the hood to calculate some operations, and numexpr sets the maximal number of threads for vml to 1, when it is imported:

# The default for VML is 1 thread (see #39) set_vml_num_threads(1)

and it gets imported by pandas when df+df is evaluated in expressions.py:

from pandas.core.computation.check import _NUMEXPR_INSTALLED  if _NUMEXPR_INSTALLED:    import numexpr as ne

However, Anaconda distribution also uses vml-functionality for such functions as sqrt, sin, cos and so on - and once numexpr set the maximal number of vml-threads to 1, the numpy-functions no longer use parallelization.

The problem can be easily seen in gdb (using your slow script):

>>> gdb --args python slow.py (gdb) b mkl_serv_domain_set_num_threads function "mkl_serv_domain_set_num_threads" not defined. Make breakpoint pending on future shared library load? (y or [n]) y Breakpoint 1 (mkl_serv_domain_set_num_threads) pending. (gbd) run Thread 1 "python" hit Breakpoint 1, 0x00007fffee65cd70 in mkl_serv_domain_set_num_threads () from /home/ed/anaconda37/lib/python3.7/site-packages/numpy/../../../libmkl_intel_thread.so (gdb) bt  #0  0x00007fffee65cd70 in mkl_serv_domain_set_num_threads () from /home/ed/anaconda37/lib/python3.7/site-packages/numpy/../../../libmkl_intel_thread.so #1  0x00007fffe978026c in _set_vml_num_threads(_object*, _object*) () from /home/ed/anaconda37/lib/python3.7/site-packages/numexpr/interpreter.cpython-37m-x86_64-linux-gnu.so #2  0x00005555556cd660 in _PyMethodDef_RawFastCallKeywords () at /tmp/build/80754af9/python_1553721932202/work/Objects/call.c:694 ... (gdb) print $rdi $1 = 1

i.e. we can see, numexpr sets number of threads to 1. Which is later used when vml-sqrt function is called:

(gbd) b mkl_serv_domain_get_max_threads Breakpoint 2 at 0x7fffee65a900 (gdb) (gdb) c Continuing.  Thread 1 "python" hit Breakpoint 2, 0x00007fffee65a900 in mkl_serv_domain_get_max_threads () from /home/ed/anaconda37/lib/python3.7/site-packages/numpy/../../../libmkl_intel_thread.so (gdb) bt #0  0x00007fffee65a900 in mkl_serv_domain_get_max_threads () from /home/ed/anaconda37/lib/python3.7/site-packages/numpy/../../../libmkl_intel_thread.so #1  0x00007ffff01fcea9 in mkl_vml_serv_threader_d_1i_1o () from /home/ed/anaconda37/lib/python3.7/site-packages/numpy/../../../libmkl_intel_thread.so #2  0x00007fffedf78563 in vdSqrt () from /home/ed/anaconda37/lib/python3.7/site-packages/numpy/../../../libmkl_intel_lp64.so #3  0x00007ffff5ac04ac in trivial_two_operand_loop () from /home/ed/anaconda37/lib/python3.7/site-packages/numpy/core/_multiarray_umath.cpython-37m-x86_64-linux-gnu.so

So we can see numpy uses vml's implementation of vdSqrt which utilizes mkl_vml_serv_threader_d_1i_1o to decide whether calculation should be done in parallel and it looks the number of threads:

(gdb) fin Run till exit from #0  0x00007fffee65a900 in mkl_serv_domain_get_max_threads () from /home/ed/anaconda37/lib/python3.7/site-packages/numpy/../../../libmkl_intel_thread.so 0x00007ffff01fcea9 in mkl_vml_serv_threader_d_1i_1o () from /home/ed/anaconda37/lib/python3.7/site-packages/numpy/../../../libmkl_intel_thread.so (gdb) print $rax $2 = 1

the register %rax has the maximal number of threads and it is 1.

Now we can use numexpr to increase the number of vml-threads, i.e.:

import numpy as np import numexpr as ne import pandas as pd df=pd.DataFrame(np.random.random((10,10))) df+df  #HERE: reset number of vml-threads ne.set_vml_num_threads(8)  x=np.random.random(1000000) for i in range(10000):     np.sqrt(x)     # now in parallel

Now multiple cores are utilized!

188

answered Sep 21 '22 09:09

ead

Related questions
                            
                                Detect if a python module changes and then reload
                            
                                SQLAlchemy and Multiple Databases
                            
                                Easy application logging/debugging with nginx, uwsgi, flask?
                            
                                how we can create android apk for python based application [duplicate]
                            
                                Java's TreeSet equivalent in Python?
                            
                                What is the correct way to override the __dir__ method?
                            
                                Non-blocking file access with Twisted
                            
                                Distribute a Python package with a compiled dynamic shared library
                            
                                Integration of Python console into a GUI C++ application
                            
                                why is converting a long 2D list to numpy array so slow?
                            
                                How does Spark running on YARN account for Python memory usage?
                            
                                Python: argparse optional arguments without dashes
                            
                                Python 2 and Python 3 dual development
                            
                                How to list dependencies for a python library without installing? [duplicate]
                            
                                Airflow Python Unit Test?
                            
                                celery shutdown worker after particular task
                            
                                Parsing HTML in Python [closed]
                            
                                How to check whether string might be type-cast to float in Python? [duplicate]
                            
                                Efficiently get indices of histogram bins in Python
                            
                                Make predictions using a tensorflow graph from a keras model

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Weird bug in Pandas and Numpy regarding multithreading

Tags:

python

pandas

numpy

user15964

People also ask

1 Answers

ead

Recent Activity

Donate For Us