I am running a Python 3.6 script as multiple separate processes on different processors of a parallel computing cluster.
Up to 35 processes run simultaneously with no problem, but the 36th (and any more) crashes with a segmentation fault on the second line which is import pandas as pd
. Interestingly, the first line import os
does not cause an issue.
The full error message is:
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1024 current, 2067021 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1024 current, 2067021 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1024 current, 2067021 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1024 current, 2067021 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1024 current, 2067021 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1024 current, 2067021 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1024 current, 2067021 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1024 current, 2067021 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1024 current, 2067021 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1024 current, 2067021 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1024 current, 2067021 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1024 current, 2067021 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1024 current, 2067021 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1024 current, 2067021 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1024 current, 2067021 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1024 current, 2067021 max
Traceback (most recent call last):
File "/home/.../myscript.py", line 32, in <module>
import pandas as pd
File "/home/.../python_venv2/lib/python3.6/site-packages/pandas/__init__.py", line 13, in <module>
__import__(dependency)
File "/home/.../python_venv2/lib/python3.6/site-packages/numpy/__init__.py", line 142, in <module>
from . import add_newdocs
File "/home/.../python_venv2/lib/python3.6/site-packages/numpy/add_newdocs.py", line 13, in <module>
from numpy.lib import add_newdoc
File "/home/.../python_venv2/lib/python3.6/site-packages/numpy/lib/__init__.py", line 8, in <module>
from .type_check import *
File "/home/.../python_venv2/lib/python3.6/site-packages/numpy/lib/type_check.py", line 11, in <module>
import numpy.core.numeric as _nx
File "/home/.../python_venv2/lib/python3.6/site-packages/numpy/core/__init__.py", line 16, in <module>
from . import multiarray
SystemError: initialization of multiarray raised unreported exception
/var/spool/slurmd/job04590/slurm_script: line 11: 26963 Segmentation fault python /home/.../myscript.py -x 38
Pandas and a few other packages are installed in a virtual environment. I have duplicated the virtual environment, so that there are no more than 24 processes running in each venv. For example, the error script above came from a script running in the virtual environment called python_venv2
.
The problem occurs on the 36th process every time regardless of how many of the processes are importing from the particular instance of Pandas. (I am not even making a dent in the capacity of the parallel computing cluster.)
So, if it is not a restriction on the number of processes accessing Pandas, is it a restriction on the number of processes running Python? Why is 35 the limit?
Is it possible to install multiple copies of Python on the machine (in separate virtual environments?) so that I can run more than 35 processes?
Decomposing the Error Message
Your error message includes the following hint:
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1024 current, 2067021 max
The RLIMIT_NPROC
variable controls the total number of processes that user can have. More specifically, as it is a per process setting, when fork()
, clone()
, vfork()
, &c are called by a process, the RLIMIT_NPROC
value for that process is compared to the total process count for that process's parent user. If that value is exceeded, things shut down, as you've experienced.
The error message indicates that OpenBLAS was unable to create additional threads because your user had used all the threads RLIMIT_NPROC
had given it.
Since you're running on a cluster, it's unlikely that your user is running many threads (unlike, say, if you were on your personal machine and browsing the web, playing music, &c), so it's reasonable to conclude that OpenBLAS is trying to start multiple threads.
How OpenBLAS Uses Threads
OpenBLAS can use multiple threads to accelerate linear algebra. You may want many threads for solving a single, larger problem quickly. You may want fewer threads for solving many smaller problems simultaneously.
OpenBLAS has several ways to limit the number of threads it uses. These are controlled via:
export OPENBLAS_NUM_THREADS=4
export GOTO_NUM_THREADS=4
export OMP_NUM_THREADS=4
The priorities are OPENBLAS_NUM_THREADS > GOTO_NUM_THREADS > OMP_NUM_THREADS. (I think this means that OPENBLAS_NUM_THREADS
overrides OMP_NUM_THREADS
; however, OpenBLAS ignores OPENBLAS_NUM_THREADS
and GOTO_NUM_THREADS
when compiled with USE_OPENMP=1
.)
If none of the foregoing variables are set, OpenBLAS will run using a number of threads equal to the number of cores on your machine (32 on your machine)
Your Situation
Your cluster has 32-core CPUs. You're trying to run 36 instances of Python. Each instance requires 1 thread for Python + 32 threads for OpenBLAS. You'll also need 1 thread for your SSH connection and 1 thread for your shell. That means that you need 36*(32+1)+2=1190 threads.
The nuclear option for fixing the problem is to use:
export OPENBLAS_NUM_THREADS=1
which should bring you down to 36*(1+1)+2=74 threads.
Since you have spare capacity, you could adjust OPENBLAS_NUM_THREADS
to a higher value, but then the OpenBLAS instances owned by your separate Python processes will interfere with each other. So there's a trade-off between how fast you get one solution versus how fast you can get many solutions. Ideally, you can solve this trade-off by running fewer Pythons per node and using more nodes.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With