Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Use Python Pool with context manager or close and join

Tags:

python

pool

The Python documentation has examples in the format of

with Pool() as p:
    p.map(do)

but I see a lot of people using the format below.

p = Pool()
p.map(do)
p.close()
p.join()

Which is more desirable?

like image 804
Seung Avatar asked Mar 07 '19 02:03

Seung


People also ask

Do I need to close multiprocessing pool?

If the process pool is not explicitly closed, it means that the resources required to operate the process pool, e.g. the child processes, their threads, and stack space, may not be released and made available to the program. multiprocessing.

What does pool Join do Python?

Pool in Python provides a pool of reusable processes for executing ad hoc tasks. A process pool can be configured when it is created, which will prepare the child workers. A process pool object which controls a pool of worker processes to which jobs can be submitted.

What is the difference between pool and process in Python?

Pool is generally used for heterogeneous tasks, whereas multiprocessing. Process is generally used for homogeneous tasks. The Pool is designed to execute heterogeneous tasks, that is tasks that do not resemble each other. For example, each task submitted to the process pool may be a different target function.

What does join do in Python multiprocessing?

Python multiprocessing join The join method blocks the execution of the main process until the process whose join method is called terminates. Without the join method, the main process won't wait until the process gets terminated. The example calls the join on the newly created process.


1 Answers

I think using Pool as a context manager (e.g., with ...) is desirable. It's a newer addition to Pool, and it lets you more cleanly encapsulate the lifespan of the pool.

One thing to be aware of is, that when the context manager exits, it will terminate the pool and any ongoing tasks. This means that you still want to do p.join() in some cases. Your example doesn't require this, because p.map will block execution until the task is done anyway:

A parallel equivalent of the map() built-in function (it supports only one iterable argument though). It blocks until the result is ready.

https://docs.python.org/3.7/library/multiprocessing.html#multiprocessing.pool.Pool.map

Therefore, in the second example, the call to .join() is unnecessary, as .map() will block until all tasks have completed.

However, using .map_async would make .join useful:

with Pool() as p:
    p.map_async(do_something, range(100))
    # Do something else while tasks are running
    p.close()
    p.join()

Edit: as Facundo Olano points out, .close() must always be called before .join(), as stated in the docs:

Wait for the worker processes to exit. One must call close() or terminate() before using join().

https://docs.python.org/3.7/library/multiprocessing.html#multiprocessing.pool.Pool.join

like image 151
Nikolas Stevenson-Molnar Avatar answered Sep 19 '22 16:09

Nikolas Stevenson-Molnar