Understand multiprocessing in no more than 6 minutes Multiprocessing is quintessential when a long-running process has to be speeded up or multiple processes have to execute parallelly. Executing a process on a single core confines its capability, which could otherwise spread its tentacles across multiple cores.
As to Pool. close(), you should call that when - and only when - you're never going to submit more work to the Pool instance. So Pool. close() is typically called when the parallelizable part of your main program is finished.
pool. close() makes sure that process pool does not accept new processes, and pool. join() waits for the processes to properly finish their work and return.
It works like a map-reduce architecture. It maps the input to the different processors and collects the output from all the processors. After the execution of code, it returns the output in form of a list or array. It waits for all the tasks to finish and then returns the output.
No, you don't, but it's probably a good idea if you aren't going to use the pool anymore.
Reasons for calling pool.close
or pool.join
are well said by Tim Peters in this SO post:
As to Pool.close(), you should call that when - and only when - you're never going to submit more work to the Pool instance. So Pool.close() is typically called when the parallelizable part of your main program is finished. Then the worker processes will terminate when all work already assigned has completed.
It's also excellent practice to call Pool.join() to wait for the worker processes to terminate. Among other reasons, there's often no good way to report exceptions in parallelized code (exceptions occur in a context only vaguely related to what your main program is doing), and Pool.join() provides a synchronization point that can report some exceptions that occurred in worker processes that you'd otherwise never see.
I had the same memory issue as Memory usage keep growing with Python's multiprocessing.pool when I didn't use pool.close()
and pool.join()
when using pool.map()
with a function that calculated Levenshtein distance. The function worked fine, but wasn't garbage collected properly on a Win7 64 machine, and the memory usage kept growing out of control every time the function was called until it took the whole operating system down. Here's the code that fixed the leak:
stringList = []
for possible_string in stringArray:
stringList.append((searchString,possible_string))
pool = Pool(5)
results = pool.map(myLevenshteinFunction, stringList)
pool.close()
pool.join()
After closing and joining the pool the memory leak went away.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With