Until recently when I used numpy methods like np.dot(A,B), only a single core was used. However, since today suddently all 8 cores of my linux machine are being used, which is a problem.
A minimal working example:
import numpy as np
N = 100
a = np.random.rand(N,N)
b = np.random.rand(N,N)
for i in range(100000):
a = np.dot(a,b)
On my other laptop it works all fine on a single core. Could this be due to some new libraries?
This morning I updated matplotlib and cairocffi via pip, but that's all.
Any ideas how to go back to a single core?
Edit:
When I run
np.__config__.show()
I get the following output
openblas_info:
libraries = ['openblas', 'openblas']
define_macros = [('HAVE_CBLAS', None)]
language = c
library_dirs = ['/usr/local/lib']
openblas_lapack_info:
libraries = ['openblas', 'openblas']
define_macros = [('HAVE_CBLAS', None)]
language = c
library_dirs = ['/usr/local/lib']
lapack_opt_info:
libraries = ['openblas', 'openblas']
define_macros = [('HAVE_CBLAS', None)]
language = c
library_dirs = ['/usr/local/lib']
blas_mkl_info:
NOT AVAILABLE
blas_opt_info:
libraries = ['openblas', 'openblas']
define_macros = [('HAVE_CBLAS', None)]
language = c
library_dirs = ['/usr/local/lib']
NumPy does not run in parallel. On the other hand Numba fully utilizes the parallel execution capabilities of your computer. NumPy functions are not going to use multiple CPU cores, never mind the GPU.
NumPy comes with a flexible working mechanism that allows it to harness the SIMD features that CPUs own, in order to provide faster and more stable performance on all popular platforms. Currently, NumPy supports the X86, IBM/Power, ARM7 and ARM8 architectures.
The numpy. all() function tests whether all array elements along the mentioned axis evaluate to True.
NumPy is fast because it can do all its calculations without calling back into Python. Since this function involves looping in Python, we lose all the performance benefits of using NumPy. For a 10,000,000-entry NumPy array, this functions takes 2.5 seconds to run on my computer.
This could be because numpy
is linking against multithreaded openBLAS libraries. Try setting the global environment variable to set threading affinity as:
export OPENBLAS_MAIN_FREE=1
# Now run your python script.
Another workaround could to use ATLAS
instead of OpenBLAS
. Please see this post for more information (https://shahhj.wordpress.com/2013/10/27/numpy-and-blas-no-problemo/). This post proposes some other workarounds as well which might be worth trying.
if you're using OpenBLAS try setting the OPENBLAS_NUM_THREADS environment variable = 1. OpenBLAS uses all CPU cores and is inefficient on small matrices.
export OPENBLAS_NUM_THREADS = 1
You can read more here: https://github.com/numpy/numpy/issues/8120
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With