Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

multiprocessing returns "too many open files" but using `with...as` fixes it. Why?

Tags:

I was using this answer in order to run parallel commands with multiprocessing in Python on a Linux box.

My code did something like:

import multiprocessing import logging  def cycle(offset):     # Do stuff  def run():     for nprocess in process_per_cycle:         logger.info("Start cycle with %d processes", nprocess)         offsets = list(range(nprocess))         pool = multiprocessing.Pool(nprocess)         pool.map(cycle, offsets) 

But I was getting this error: OSError: [Errno 24] Too many open files
So, the code was opening too many file descriptor, i.e.: it was starting too many processes and not terminating them.

I fixed it replacing the last two lines with these lines:

    with multiprocessing.Pool(nprocess) as pool:         pool.map(cycle, offsets) 

But I do not know exactly why those lines fixed it.

What is happening underneath of that with?

like image 479
nephewtom Avatar asked Aug 14 '17 00:08

nephewtom


People also ask

What is the purpose of the process multiprocessing?

The multiprocessing package offers both local and remote concurrency, effectively side-stepping the Global Interpreter Lock by using subprocesses instead of threads. Due to this, the multiprocessing module allows the programmer to fully leverage multiple processors on a given machine. It runs on both Unix and Windows.

Is Ray better than multiprocessing?

On a machine with 48 physical cores, Ray is 6x faster than Python multiprocessing and 17x faster than single-threaded Python. Python multiprocessing doesn't outperform single-threaded Python on fewer than 24 cores.

What is forking in multiprocessing?

When a process is forked the child process inherits all the same variables in the same state as they were in the parent. Each child process then continues independently from the forking point. The pool divides the args between the children and they work though them sequentially.

What is multiprocess synchronization?

Synchronization between processes Multiprocessing is a package which supports spawning processes using an API. This package is used for both local and remote concurrencies. Using this module, programmer can use multiple processors on a given machine. It runs on Windows and UNIX os.


1 Answers

You're creating new processes inside a loop, and then forgetting to close them once you're done with them. As a result, there comes a point where you have too many open processes. This is a bad idea.

You could fix this by using a context manager which automatically calls pool.terminate, or manually call pool.terminate yourself. Alternatively, why don't you create a pool outside the loop just once, and then send tasks to the processes inside?

pool = multiprocessing.Pool(nprocess) # initialise your pool for nprocess in process_per_cycle:     ...            pool.map(cycle, offsets) # delegate work inside your loop  pool.close() # shut down the pool 

For more information, you could peruse the multiprocessing.Pool documentation.

like image 108
cs95 Avatar answered Oct 11 '22 23:10

cs95