I was using this answer in order to run parallel commands with multiprocessing in Python on a Linux box.
My code did something like:
import multiprocessing import logging def cycle(offset): # Do stuff def run(): for nprocess in process_per_cycle: logger.info("Start cycle with %d processes", nprocess) offsets = list(range(nprocess)) pool = multiprocessing.Pool(nprocess) pool.map(cycle, offsets)
But I was getting this error: OSError: [Errno 24] Too many open files
So, the code was opening too many file descriptor, i.e.: it was starting too many processes and not terminating them.
I fixed it replacing the last two lines with these lines:
with multiprocessing.Pool(nprocess) as pool: pool.map(cycle, offsets)
But I do not know exactly why those lines fixed it.
What is happening underneath of that with
?
The multiprocessing package offers both local and remote concurrency, effectively side-stepping the Global Interpreter Lock by using subprocesses instead of threads. Due to this, the multiprocessing module allows the programmer to fully leverage multiple processors on a given machine. It runs on both Unix and Windows.
On a machine with 48 physical cores, Ray is 6x faster than Python multiprocessing and 17x faster than single-threaded Python. Python multiprocessing doesn't outperform single-threaded Python on fewer than 24 cores.
When a process is forked the child process inherits all the same variables in the same state as they were in the parent. Each child process then continues independently from the forking point. The pool divides the args between the children and they work though them sequentially.
Synchronization between processes Multiprocessing is a package which supports spawning processes using an API. This package is used for both local and remote concurrencies. Using this module, programmer can use multiple processors on a given machine. It runs on Windows and UNIX os.
You're creating new processes inside a loop, and then forgetting to close them once you're done with them. As a result, there comes a point where you have too many open processes. This is a bad idea.
You could fix this by using a context manager which automatically calls pool.terminate
, or manually call pool.terminate
yourself. Alternatively, why don't you create a pool outside the loop just once, and then send tasks to the processes inside?
pool = multiprocessing.Pool(nprocess) # initialise your pool for nprocess in process_per_cycle: ... pool.map(cycle, offsets) # delegate work inside your loop pool.close() # shut down the pool
For more information, you could peruse the multiprocessing.Pool
documentation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With