The Python documentation has examples in the format of <pre class="prettyprint"><code>with Pool() as p: p.map(do) </code></pre> but I see a lot of people using the format below. <pre class="prettyprint"><code>p = Pool() p.map(do) p.close() p.join() </code></pre> Which is more desirable?

I think using <code>Pool</code> as a context manager (e.g., <code>with ...</code>) is desirable. It's a newer addition to <code>Pool</code>, and it lets you more cleanly encapsulate the lifespan of the pool. One thing to be aware of is, that when the context manager exits, it will terminate the pool and any ongoing tasks. This means that you still want to do <code>p.join()</code> in some cases. Your example doesn't require this, because <code>p.map</code> will block execution until the task is done anyway: <blockquote> A parallel equivalent of the map() built-in function (it supports only one iterable argument though). It blocks until the result is ready. https://docs.python.org/3.7/library/multiprocessing.html#multiprocessing.pool.Pool.map </blockquote> Therefore, in the second example, the call to <code>.join()</code> is unnecessary, as <code>.map()</code> will block until all tasks have completed. However, using <code>.map_async</code> would make <code>.join</code> useful: <pre class="prettyprint"><code>with Pool() as p: p.map_async(do_something, range(100)) # Do something else while tasks are running p.close() p.join() </code></pre> Edit: as Facundo Olano points out, <code>.close()</code> must always be called before <code>.join()</code>, as stated in the docs: <blockquote> Wait for the worker processes to exit. One must call close() or terminate() before using join(). https://docs.python.org/3.7/library/multiprocessing.html#multiprocessing.pool.Pool.join </blockquote>

Use Python Pool with context manager or close and join

Tags:

python

pool

The Python documentation has examples in the format of

with Pool() as p:
    p.map(do)

but I see a lot of people using the format below.

p = Pool()
p.map(do)
p.close()
p.join()

Which is more desirable?

804

asked Mar 07 '19 02:03

Seung

1 Answers

I think using Pool as a context manager (e.g., with ...) is desirable. It's a newer addition to Pool, and it lets you more cleanly encapsulate the lifespan of the pool.

One thing to be aware of is, that when the context manager exits, it will terminate the pool and any ongoing tasks. This means that you still want to do p.join() in some cases. Your example doesn't require this, because p.map will block execution until the task is done anyway:

A parallel equivalent of the map() built-in function (it supports only one iterable argument though). It blocks until the result is ready.

https://docs.python.org/3.7/library/multiprocessing.html#multiprocessing.pool.Pool.map

Therefore, in the second example, the call to .join() is unnecessary, as .map() will block until all tasks have completed.

However, using .map_async would make .join useful:

with Pool() as p:
    p.map_async(do_something, range(100))
    # Do something else while tasks are running
    p.close()
    p.join()

Edit: as Facundo Olano points out, .close() must always be called before .join(), as stated in the docs:

Wait for the worker processes to exit. One must call close() or terminate() before using join().

https://docs.python.org/3.7/library/multiprocessing.html#multiprocessing.pool.Pool.join

151

answered Sep 19 '22 16:09

Nikolas Stevenson-Molnar

Related questions
                            
                                Cython: Buffer type mismatch, expected 'int' but got 'long'
                            
                                Implementing Bi-directional LSTM-CRF Network
                            
                                Why not use python's assert statement in tests, these days?
                            
                                Complete a multipart_upload with boto3?
                            
                                figure.add_subplot() vs pyplot.subplot()
                            
                                Passing arguments (for argparse) with unittest discover
                            
                                sqlalchemy, using check constraints
                            
                                TensorBoard: How to plot histogram for gradients?
                            
                                How to smooth by interpolation when using pcolormesh?
                            
                                Is there a comprehensive table of Python's "magic constants"?
                            
                                Simplifying / optimizing a chain of for-loops
                            
                                Heroku - No web process running
                            
                                Search and replace placeholder text in PDF with Python
                            
                                Why does a newly created variable in Python have a ref-count of four?
                            
                                Recommended way to implement __eq__ and __hash__
                            
                                ModuleNotFoundError: No module named 'BaseHTTPServer'
                            
                                python a,b = b,a implementation? How is it different from C++ swap function?
                            
                                VSCode: The term 'python' is not recognized...but py works
                            
                                Python and Dart Integration in Flutter Mobile Application
                            
                                PyTorch: What's the difference between state_dict and parameters()?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With