Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Set max number of threads at runtime on numpy/openblas

I'd like to know if it's possible to change at (Python) runtime the maximum number of threads used by OpenBLAS behind numpy?

I know it's possible to set it before running the interpreter through the environment variable OMP_NUM_THREADS, but I'd like to change it at runtime.

Typically, when using MKL instead of OpenBLAS, it is possible:

import mkl
mkl.set_num_threads(n)
like image 307
Théo T Avatar asked Apr 10 '15 10:04

Théo T


People also ask

Is it possible to set the number of threads used by NumPy?

I think it would be useful and important to be able to easily set the number of threads used by Numpy after Numpy import. From the perspective of library developers, it is often useful to be able to control the number of threads used by Numpy, see for example biopython/biopython#1401.

How do I set the maximum number of threads in OpenBLAS?

If you don't specify one, it's # automatically detected by the the script. # NUM_THREADS = 24 ... By default OpenBLAS will try to set the maximum number of threads to use automatically, but you could try uncommenting and editing this line yourself if it is not detecting this correctly.

Can NumPy link to OpenBLAS + intelmkl at runtime?

Firstly, I don't really understand what you mean by 'OpenBLAS + IntelMKL'. Both of those are BLAS libraries, and numpy should only link to one of them at runtime. You should probably check which of these two numpy is actually using. You can do this by calling:

How does NumPy know which BLAS library to use?

note that this is actually quite tricky to tackle. NumPy does not actually know which BLAS library is used to implement the functions, it just assumes that whatever implements it uses standard BLAS apis and abis.


2 Answers

You can do this by calling the openblas_set_num_threads function using ctypes. I often find myself wanting to do this, so I wrote a little context manager:

import contextlib
import ctypes
from ctypes.util import find_library

# Prioritize hand-compiled OpenBLAS library over version in /usr/lib/
# from Ubuntu repos
try_paths = ['/opt/OpenBLAS/lib/libopenblas.so',
             '/lib/libopenblas.so',
             '/usr/lib/libopenblas.so.0',
             find_library('openblas')]
openblas_lib = None
for libpath in try_paths:
    try:
        openblas_lib = ctypes.cdll.LoadLibrary(libpath)
        break
    except OSError:
        continue
if openblas_lib is None:
    raise EnvironmentError('Could not locate an OpenBLAS shared library', 2)


def set_num_threads(n):
    """Set the current number of threads used by the OpenBLAS server."""
    openblas_lib.openblas_set_num_threads(int(n))


# At the time of writing these symbols were very new:
# https://github.com/xianyi/OpenBLAS/commit/65a847c
try:
    openblas_lib.openblas_get_num_threads()
    def get_num_threads():
        """Get the current number of threads used by the OpenBLAS server."""
        return openblas_lib.openblas_get_num_threads()
except AttributeError:
    def get_num_threads():
        """Dummy function (symbol not present in %s), returns -1."""
        return -1
    pass

try:
    openblas_lib.openblas_get_num_procs()
    def get_num_procs():
        """Get the total number of physical processors"""
        return openblas_lib.openblas_get_num_procs()
except AttributeError:
    def get_num_procs():
        """Dummy function (symbol not present), returns -1."""
        return -1
    pass


@contextlib.contextmanager
def num_threads(n):
    """Temporarily changes the number of OpenBLAS threads.

    Example usage:

        print("Before: {}".format(get_num_threads()))
        with num_threads(n):
            print("In thread context: {}".format(get_num_threads()))
        print("After: {}".format(get_num_threads()))
    """
    old_n = get_num_threads()
    set_num_threads(n)
    try:
        yield
    finally:
        set_num_threads(old_n)

You can use it like this:

with num_threads(8):
    np.dot(x, y)

As mentioned in the comments, openblas_get_num_threads and openblas_get_num_procs were very new features at the time of writing, and might therefore not be available unless you compiled OpenBLAS from the latest version of the source code.

like image 94
ali_m Avatar answered Oct 12 '22 19:10

ali_m


We recently developed threadpoolctl, a cross platform package to do control the number of threads used in calls to C-level thread-pools in python. It works similarly to the answer by @ali_m but detects automatically the libraries that needs to be limited by looping through all loaded libraries. It also comes with introspection APIs.

This package can be installed using pip install threadpoolctl and come with a context manager that allow you to control the number of threads used by packages such as numpy:

from threadpoolctl import threadpool_limits
import numpy as np


with threadpool_limits(limits=1, user_api='blas'):
    # In this block, calls to blas implementation (like openblas or MKL)
    # will be limited to use only one thread. They can thus be used jointly
    # with thread-parallelism.
    a = np.random.randn(1000, 1000)
    a_squared = a @ a

you can also have finer control on different threadpools (such as differenciating blas from openmp calls).

Note: this package is still in development and any feedback is welcomed.

like image 22
Thomas Moreau Avatar answered Oct 12 '22 18:10

Thomas Moreau