The documentation for multiprocessing
states the following about Pool.join()
:
Wait for the worker processes to exit. One must call
close()
orterminate()
before usingjoin()
.
I know that Pool.close()
prevents any other task from being submitted to the pool; and that Pool.join()
waits for the pool to finish before proceeding with the parent process.
So, why can I not call Pool.join()
before Pool.close()
in the case when I want to reuse my pool for performing multiple tasks and then finally close()
it much later? For example:
pool = Pool()
pool.map(do1)
pool.join() # need to wait here for synchronization
.
.
.
pool.map(do2)
pool.join() # need to wait here again for synchronization
.
.
.
pool.map(do3)
pool.join() # need to wait here again for synchronization
pool.close()
# program ends
Why must one "call close()
or terminate()
before using join()
"?
As to Pool. close(), you should call that when - and only when - you're never going to submit more work to the Pool instance. So Pool. close() is typically called when the parallelizable part of your main program is finished.
pool. close() makes sure that process pool does not accept new processes, and pool. join() waits for the processes to properly finish their work and return.
The pool. map() method. The difference is that the result of each item is received as soon as it is ready, instead of waiting for all of them to be finished. Moreover, the map() method converts the iterable into a list (if it is not). However, the imap() method does not.
So, why can I not call
Pool.join()
beforePool.close()
Because join()
waits for the workers to exit. Not just finish the tasks they've been given, but actually exit. If you didn't call close()
beforehand, then no one had told the workers to exit and they are on stand-by, ready to accept further tasks.
So a call to join()
not preceded by a call to close()
would just hang - join()
would wait forever for workers to exit, which no one told them to do. For this reason Python raises a ValueError("pool is still running")
error if yopu attempt to do so.
As David Schwartz pointed out, don't call join()
to "synchronize" - it doesn't serve that purpose.
You need not call join()
after map()
in your case, because map()
call blocks until all results are done.
Call join()
before close()
or terminate()
is incorrect. Because join()
is a blocking call and wait for the worker processes to exit. Therefore you can not reuse pool after join()
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With