I'd like to know if it's possible to change at (Python) runtime the maximum number of threads used by OpenBLAS behind numpy?
I know it's possible to set it before running the interpreter through the environment variable OMP_NUM_THREADS
, but I'd like to change it at runtime.
Typically, when using MKL instead of OpenBLAS, it is possible:
import mkl
mkl.set_num_threads(n)
I think it would be useful and important to be able to easily set the number of threads used by Numpy after Numpy import. From the perspective of library developers, it is often useful to be able to control the number of threads used by Numpy, see for example biopython/biopython#1401.
If you don't specify one, it's # automatically detected by the the script. # NUM_THREADS = 24 ... By default OpenBLAS will try to set the maximum number of threads to use automatically, but you could try uncommenting and editing this line yourself if it is not detecting this correctly.
Firstly, I don't really understand what you mean by 'OpenBLAS + IntelMKL'. Both of those are BLAS libraries, and numpy should only link to one of them at runtime. You should probably check which of these two numpy is actually using. You can do this by calling:
note that this is actually quite tricky to tackle. NumPy does not actually know which BLAS library is used to implement the functions, it just assumes that whatever implements it uses standard BLAS apis and abis.
You can do this by calling the openblas_set_num_threads
function using ctypes
. I often find myself wanting to do this, so I wrote a little context manager:
import contextlib
import ctypes
from ctypes.util import find_library
# Prioritize hand-compiled OpenBLAS library over version in /usr/lib/
# from Ubuntu repos
try_paths = ['/opt/OpenBLAS/lib/libopenblas.so',
'/lib/libopenblas.so',
'/usr/lib/libopenblas.so.0',
find_library('openblas')]
openblas_lib = None
for libpath in try_paths:
try:
openblas_lib = ctypes.cdll.LoadLibrary(libpath)
break
except OSError:
continue
if openblas_lib is None:
raise EnvironmentError('Could not locate an OpenBLAS shared library', 2)
def set_num_threads(n):
"""Set the current number of threads used by the OpenBLAS server."""
openblas_lib.openblas_set_num_threads(int(n))
# At the time of writing these symbols were very new:
# https://github.com/xianyi/OpenBLAS/commit/65a847c
try:
openblas_lib.openblas_get_num_threads()
def get_num_threads():
"""Get the current number of threads used by the OpenBLAS server."""
return openblas_lib.openblas_get_num_threads()
except AttributeError:
def get_num_threads():
"""Dummy function (symbol not present in %s), returns -1."""
return -1
pass
try:
openblas_lib.openblas_get_num_procs()
def get_num_procs():
"""Get the total number of physical processors"""
return openblas_lib.openblas_get_num_procs()
except AttributeError:
def get_num_procs():
"""Dummy function (symbol not present), returns -1."""
return -1
pass
@contextlib.contextmanager
def num_threads(n):
"""Temporarily changes the number of OpenBLAS threads.
Example usage:
print("Before: {}".format(get_num_threads()))
with num_threads(n):
print("In thread context: {}".format(get_num_threads()))
print("After: {}".format(get_num_threads()))
"""
old_n = get_num_threads()
set_num_threads(n)
try:
yield
finally:
set_num_threads(old_n)
You can use it like this:
with num_threads(8):
np.dot(x, y)
As mentioned in the comments, openblas_get_num_threads
and openblas_get_num_procs
were very new features at the time of writing, and might therefore not be available unless you compiled OpenBLAS from the latest version of the source code.
We recently developed threadpoolctl
, a cross platform package to do control the number of threads used in calls to C-level thread-pools in python. It works similarly to the answer by @ali_m but detects automatically the libraries that needs to be limited by looping through all loaded libraries. It also comes with introspection APIs.
This package can be installed using pip install threadpoolctl
and come with a context manager that allow you to control the number of threads used by packages such as numpy
:
from threadpoolctl import threadpool_limits
import numpy as np
with threadpool_limits(limits=1, user_api='blas'):
# In this block, calls to blas implementation (like openblas or MKL)
# will be limited to use only one thread. They can thus be used jointly
# with thread-parallelism.
a = np.random.randn(1000, 1000)
a_squared = a @ a
you can also have finer control on different threadpools (such as differenciating blas
from openmp
calls).
Note: this package is still in development and any feedback is welcomed.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With