Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python multiprocessing pool: how can I know when all the workers in the pool have finished?

I am running a multiprocessing pool in python, where I have ~2000 tasks, being mapped to 24 workers with the pool. each task creates a file based on some data analysis and webservices.

I want to run a new task, when all the tasks in the pool were finished. how can I tell when all the processes in the pool have finished?

like image 735
Dror Hilman Avatar asked May 19 '15 04:05

Dror Hilman


People also ask

Do I need to close multiprocessing pool?

If the process pool is not explicitly closed, it means that the resources required to operate the process pool, e.g. the child processes, their threads, and stack space, may not be released and made available to the program. multiprocessing.

How do I check if a process is running in Python multiprocessing?

We can check if a process is alive via the multiprocessing. Process. is_alive() method.

What is the difference between pool and process in Python?

Pool is generally used for heterogeneous tasks, whereas multiprocessing. Process is generally used for homogeneous tasks. The Pool is designed to execute heterogeneous tasks, that is tasks that do not resemble each other. For example, each task submitted to the process pool may be a different target function.

How do processes pools work in multiprocessing?

Pool allows multiple jobs per process, which may make it easier to parallel your program. If you have a numbers jobs to run in parallel, you can make a Pool with number of processes the same number of as CPU cores and after that pass the list of the numbers jobs to pool. map.


1 Answers

You want to use the join method, which halts the main process thread from moving forward until all sub-processes ends:

Block the calling thread until the process whose join() method is called terminates or until the optional timeout occurs.

from multiprocessing import Process

def f(name):
    print 'hello', name

if __name__ == '__main__':
    processes = []
    for i in range(10):
        p = Process(target=f, args=('bob',))
        processes.append(p)

    for p in processes:
        p.start()
        p.join()

     # only get here once all processes have finished.
     print('finished!')

EDIT:

To use join with pools

    pool = Pool(processes=4)  # start 4 worker processes
    result = pool.apply_async(f, (10,))  # do some work
    pool.close()
    pool.join()  # block at this line until all processes are done
    print("completed")
like image 171
14 revs, 12 users 16% Avatar answered Sep 20 '22 21:09

14 revs, 12 users 16%