Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Multiple instances of Python running simultaneously limited to 35

I am running a Python 3.6 script as multiple separate processes on different processors of a parallel computing cluster. Up to 35 processes run simultaneously with no problem, but the 36th (and any more) crashes with a segmentation fault on the second line which is import pandas as pd. Interestingly, the first line import os does not cause an issue. The full error message is:

OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1024 current, 2067021 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1024 current, 2067021 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1024 current, 2067021 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1024 current, 2067021 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1024 current, 2067021 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1024 current, 2067021 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1024 current, 2067021 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1024 current, 2067021 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1024 current, 2067021 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1024 current, 2067021 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1024 current, 2067021 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1024 current, 2067021 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1024 current, 2067021 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1024 current, 2067021 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1024 current, 2067021 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1024 current, 2067021 max
Traceback (most recent call last):
  File "/home/.../myscript.py", line 32, in <module>
    import pandas as pd
  File "/home/.../python_venv2/lib/python3.6/site-packages/pandas/__init__.py", line 13, in <module>
    __import__(dependency)
  File "/home/.../python_venv2/lib/python3.6/site-packages/numpy/__init__.py", line 142, in <module>
    from . import add_newdocs
  File "/home/.../python_venv2/lib/python3.6/site-packages/numpy/add_newdocs.py", line 13, in <module>
    from numpy.lib import add_newdoc
  File "/home/.../python_venv2/lib/python3.6/site-packages/numpy/lib/__init__.py", line 8, in <module>
    from .type_check import *
  File "/home/.../python_venv2/lib/python3.6/site-packages/numpy/lib/type_check.py", line 11, in <module>
    import numpy.core.numeric as _nx
  File "/home/.../python_venv2/lib/python3.6/site-packages/numpy/core/__init__.py", line 16, in <module>
    from . import multiarray
SystemError: initialization of multiarray raised unreported exception
/var/spool/slurmd/job04590/slurm_script: line 11: 26963 Segmentation fault      python /home/.../myscript.py -x 38

Pandas and a few other packages are installed in a virtual environment. I have duplicated the virtual environment, so that there are no more than 24 processes running in each venv. For example, the error script above came from a script running in the virtual environment called python_venv2.

The problem occurs on the 36th process every time regardless of how many of the processes are importing from the particular instance of Pandas. (I am not even making a dent in the capacity of the parallel computing cluster.)

So, if it is not a restriction on the number of processes accessing Pandas, is it a restriction on the number of processes running Python? Why is 35 the limit?

Is it possible to install multiple copies of Python on the machine (in separate virtual environments?) so that I can run more than 35 processes?

like image 429
doctorer Avatar asked Jul 10 '18 03:07

doctorer


1 Answers

Decomposing the Error Message

Your error message includes the following hint:

OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1024 current, 2067021 max

The RLIMIT_NPROC variable controls the total number of processes that user can have. More specifically, as it is a per process setting, when fork(), clone(), vfork(), &c are called by a process, the RLIMIT_NPROC value for that process is compared to the total process count for that process's parent user. If that value is exceeded, things shut down, as you've experienced.

The error message indicates that OpenBLAS was unable to create additional threads because your user had used all the threads RLIMIT_NPROC had given it.

Since you're running on a cluster, it's unlikely that your user is running many threads (unlike, say, if you were on your personal machine and browsing the web, playing music, &c), so it's reasonable to conclude that OpenBLAS is trying to start multiple threads.

How OpenBLAS Uses Threads

OpenBLAS can use multiple threads to accelerate linear algebra. You may want many threads for solving a single, larger problem quickly. You may want fewer threads for solving many smaller problems simultaneously.

OpenBLAS has several ways to limit the number of threads it uses. These are controlled via:

export OPENBLAS_NUM_THREADS=4
export GOTO_NUM_THREADS=4
export OMP_NUM_THREADS=4

The priorities are OPENBLAS_NUM_THREADS > GOTO_NUM_THREADS > OMP_NUM_THREADS. (I think this means that OPENBLAS_NUM_THREADS overrides OMP_NUM_THREADS; however, OpenBLAS ignores OPENBLAS_NUM_THREADS and GOTO_NUM_THREADS when compiled with USE_OPENMP=1.)

If none of the foregoing variables are set, OpenBLAS will run using a number of threads equal to the number of cores on your machine (32 on your machine)

Your Situation

Your cluster has 32-core CPUs. You're trying to run 36 instances of Python. Each instance requires 1 thread for Python + 32 threads for OpenBLAS. You'll also need 1 thread for your SSH connection and 1 thread for your shell. That means that you need 36*(32+1)+2=1190 threads.

The nuclear option for fixing the problem is to use:

export OPENBLAS_NUM_THREADS=1

which should bring you down to 36*(1+1)+2=74 threads.

Since you have spare capacity, you could adjust OPENBLAS_NUM_THREADS to a higher value, but then the OpenBLAS instances owned by your separate Python processes will interfere with each other. So there's a trade-off between how fast you get one solution versus how fast you can get many solutions. Ideally, you can solve this trade-off by running fewer Pythons per node and using more nodes.

like image 157
Richard Avatar answered Nov 15 '22 16:11

Richard