It seems that my numpy library is using 4 threads, and setting OMP_NUM_THREADS=1
does not stop this.
numpy.show_config()
gives me these results:
atlas_threads_info: libraries = ['lapack', 'ptf77blas', 'ptcblas', 'atlas'] library_dirs = ['/usr/lib64/atlas'] define_macros = [('ATLAS_INFO', '"\\"3.8.4\\""')] language = f77 include_dirs = ['/usr/include'] blas_opt_info: libraries = ['ptf77blas', 'ptcblas', 'atlas'] library_dirs = ['/usr/lib64/atlas'] define_macros = [('ATLAS_INFO', '"\\"3.8.4\\""')] language = c include_dirs = ['/usr/include'] atlas_blas_threads_info: libraries = ['ptf77blas', 'ptcblas', 'atlas'] library_dirs = ['/usr/lib64/atlas'] define_macros = [('ATLAS_INFO', '"\\"3.8.4\\""')] language = c include_dirs = ['/usr/include'] openblas_info: NOT AVAILABLE lapack_opt_info: libraries = ['lapack', 'ptf77blas', 'ptcblas', 'atlas'] library_dirs = ['/usr/lib64/atlas'] define_macros = [('ATLAS_INFO', '"\\"3.8.4\\""')] language = f77 include_dirs = ['/usr/include']
So I know it is using blas, but I can't figure out how to make it use 1 thread for matrix multiplication.
First, numpy supports multithreading, and this can give you a speed boost in multicore environments!
Setting OMP_NUM_THREADS=1 basically turns off the OpenMP multi-threading, so each of your Python processes remains single-threaded.
Generally, Python only uses one thread to execute the set of written statements. This means that in python only one thread will be executed at a time.
Try setting all of the following:
export MKL_NUM_THREADS=1 export NUMEXPR_NUM_THREADS=1 export OMP_NUM_THREADS=1
Sometimes it's a bit tricky to see where exactly multithreading is introduced.
There are more than the 3 mentioned environmental variables. The followings are the complete list of environmental variables and the package that uses that variable to control the number of threads it spawns. Note than you need to set these variables before doing import numpy
:
OMP_NUM_THREADS: openmp, OPENBLAS_NUM_THREADS: openblas, MKL_NUM_THREADS: mkl, VECLIB_MAXIMUM_THREADS: accelerate, NUMEXPR_NUM_THREADS: numexpr
So in practice you can do:
import os os.environ["OMP_NUM_THREADS"] = "4" # export OMP_NUM_THREADS=4 os.environ["OPENBLAS_NUM_THREADS"] = "4" # export OPENBLAS_NUM_THREADS=4 os.environ["MKL_NUM_THREADS"] = "6" # export MKL_NUM_THREADS=6 os.environ["VECLIB_MAXIMUM_THREADS"] = "4" # export VECLIB_MAXIMUM_THREADS=4 os.environ["NUMEXPR_NUM_THREADS"] = "6" # export NUMEXPR_NUM_THREADS=6
Note that as of November 2018 the Numpy developers are working on making this possible to do after you do import numpy
as well. I'll update this post once they commit those changes.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With