Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Error with OMP_NUM_THREADS when using dask distributed

I am using distributed, a framework to allow parallel computation. In this, my primary use case is with NumPy. When I include NumPy code that relies on np.linalg, I get an error with OMP_NUM_THREADS, which is related to the OpenMP library.

An minimal example:

from distributed import Executor
import numpy as np
e = Executor('144.92.142.192:8786')

def f(x, m=200, n=1000):
    A = np.random.randn(m, n)
    x = np.random.randn(n)
    #  return np.fft.fft(x)  # tested; no errors
    #  return np.random.randn(n)  # tested; no errors
    return A.dot(y).sum()  # tested; throws error below

s = [e.submit(f, x) for x in [1, 2, 3, 4]]
s = e.gather(s)

When I test with the linalg test, e.gather fails as each job throws the following error:

OMP: Error #34: System unable to allocate necessary resources for OMP thread:
OMP: System error #11: Resource temporarily unavailable
OMP: Hint: Try decreasing the value of OMP_NUM_THREADS.

What should I set OMP_NUM_THREADS to?

like image 423
Scott Avatar asked Sep 10 '16 03:09

Scott


People also ask

What is OMP_NUM_THREADS?

OMP_NUM_THREADS. Sets the maximum number of threads in the parallel region, unless overridden by omp_set_num_threads or num_threads. OMP_DYNAMIC. Specifies whether the OpenMP run time can adjust the number of threads in a parallel region.

What is the default value of OMP_NUM_THREADS?

If you do not set the OMP_NUM_THREADS environment variable, the number of processors available is the default value to form a new team for the first encountered parallel construct. By default, any nested constructs are run by one thread.

How do you set the number of threads in OpenMP?

To set the number of threads to use in your program, set the environment variable OMP_NUM_THREADS . OMP_NUM_THREADS sets the number of threads used in OpenMP parallel regions defined in your own code, and within Arm Performance Libraries.


2 Answers

Short answer

export OMP_NUM_THREADS=1

or 

dask-worker --nthreads 1

Explanation

The OMP_NUM_THREADS environment variable controls the number of threads that many libraries, including the BLAS library powering numpy.dot, use in their computations, like matrix multiply.

The conflict here is that you have two parallel libraries that are calling each other, BLAS, and dask.distributed. Each library is designed to use as many threads as there are logical cores available in the system.

For example if you had eight cores then dask.distributed might run your function f eight times at once on different threads. The numpy.dot function call within f would use eight threads per call, resulting in 64 threads running at once.

This is actually fine, you'll experience a performance hit but everything can run correctly, but it will be slower than if you use just eight threads at a time, either by limiting dask.distributed or by limiting BLAS.

Your system probably has OMP_THREAD_LIMIT set at some reasonable number like 16 to warn you of this event when it happens.

like image 168
MRocklin Avatar answered Sep 19 '22 20:09

MRocklin


If you're using MKL blas you might also get some improvement using the TBB threading layer. I haven't actually had occasion to try it out so YMMV.

http://conference.scipy.org/proceedings/scipy2018/anton_malakhov.html

like image 23
Dave Hirschfeld Avatar answered Sep 18 '22 20:09

Dave Hirschfeld