Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python multiprocessing, using pool multiple times in a loop gets stuck after first iteration

I have the following situation, where I create a pool in a for loop as follows (I know it's not very elegant, but I have to do this for pickling reasons). Assume that the pathos.multiprocessing is equivalent to python's multiprocessing library (as it is up to some details, that are not relevant for this problem). I have the following code I want to execute:

self.pool = pathos.multiprocessing.ProcessingPool(number_processes)


for i in range(5):


    all_responses = self.pool.map(wrapper_singlerun, range(self.no_of_restarts))

    pool._clear()

Now my problem: The loop successfully runs the first iteration. However, at the second iteration, the algorithm suddenly stops (Does not finish the pool.map operation. I suspected that zombie processes are generated, or that the process was somehow switched. Below you will find everything I have tried so far.

for i in range(5):

    pool = pathos.multiprocessing.ProcessingPool(number_processes)

    all_responses = self.pool.map(wrapper_singlerun, range(self.no_of_restarts))

    pool._clear()

    gc.collect()

    for p in multiprocessing.active_children():
        p.terminate()
        gc.collect()

    print("We have so many active children: ", multiprocessing.active_children()) # Returns []

The above code works perfectly well on my mac. However, when I upload it on the cluster that has the following specs, I get the error that it gets stuck after the first iteration:

DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=18.04
DISTRIB_CODENAME=bionic
DISTRIB_DESCRIPTION="Ubuntu 18.04 LTS"

This is the link to the pathos' multiprocessing library file is

like image 607
DaveTheAl Avatar asked Jul 06 '18 12:07

DaveTheAl


People also ask

What is the difference between pool and process in multiprocessing?

Pool is generally used for heterogeneous tasks, whereas multiprocessing. Process is generally used for homogeneous tasks. The Pool is designed to execute heterogeneous tasks, that is tasks that do not resemble each other. For example, each task submitted to the process pool may be a different target function.

What is a Daemonic process Python?

Daemon processes in Python Python multiprocessing module allows us to have daemon processes through its daemonic option. Daemon processes or the processes that are running in the background follow similar concept as the daemon threads. To execute the process in the background, we need to set the daemonic flag to true.

When would you use a multiprocessing pool?

Use the multiprocessing pool if your tasks are independent. This means that each task is not dependent on other tasks that could execute at the same time. It also may mean tasks that are not dependent on any data other than data provided via function arguments to the task.


1 Answers

I am assuming that you are trying to call this via some function which is not the correct way to use this.

You need to wrap it around with :

if __name__ == '__main__':
    for i in range(5):

         pool = pathos.multiprocessing.Pool(number_processes)

         all_responses = pool.map(wrapper_singlerun, 

range(self.no_of_restarts))

If you don't do it will keep on creating a copy of itself and will start putting it into stack which will ultimately fill the stack and block everything. The reason it works on mac is that it has fork while windows does not have it.

like image 193
dilkash Avatar answered Oct 22 '22 11:10

dilkash