Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: multiprocessing.map: If one process raises an exception, why aren't other processes' finally blocks called?

My understanding is that finally clauses must *always* be executed if the try has been entered.

import random  from multiprocessing import Pool from time import sleep  def Process(x):   try:     print x     sleep(random.random())     raise Exception('Exception: ' + x)   finally:     print 'Finally: ' + x  Pool(3).map(Process, ['1','2','3']) 

Expected output is that for each of x which is printed on its own by line 8, there must be an occurrence of 'Finally x'.

Example output:

$ python bug.py  1 2 3 Finally: 2 Traceback (most recent call last):   File "bug.py", line 14, in <module>     Pool(3).map(Process, ['1','2','3'])   File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/pool.py", line 225, in map     return self.map_async(func, iterable, chunksize).get()   File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/pool.py", line 522, in get     raise self._value Exception: Exception: 2 

It seems that an exception terminating one process terminates the parent and sibling processes, even though there is further work required to be done in other processes.

Why am I wrong? Why is this correct? If this is correct, how should one safely clean up resources in multiprocess Python?

like image 862
Daniel Wagner-Hall Avatar asked Oct 09 '11 01:10

Daniel Wagner-Hall


People also ask

How does multiprocessing process work in Python?

The multiprocessing package supports spawning processes. It refers to a function that loads and executes a new child processes. For the child to terminate or to continue executing concurrent computing,then the current process hasto wait using an API, which is similar to threading module.

How does multiprocessing lock work in Python?

Python provides a mutual exclusion lock for use with processes via the multiprocessing. Lock class. An instance of the lock can be created and then acquired by processes before accessing a critical section, and released after the critical section. Only one process can have the lock at any time.

Does Python multiprocessing use multiple cores?

Key Takeaways. Python is NOT a single-threaded language. Python processes typically use a single thread because of the GIL. Despite the GIL, libraries that perform computationally heavy tasks like numpy, scipy and pytorch utilise C-based implementations under the hood, allowing the use of multiple cores.

Which is better multiprocessing or multithreading in Python?

Multiprocessing is a easier to just drop in than threading but has a higher memory overhead. If your code is CPU bound, multiprocessing is most likely going to be the better choice—especially if the target machine has multiple cores or CPUs.


1 Answers

Short answer: SIGTERM trumps finally.

Long answer: Turn on logging with mp.log_to_stderr():

import random import multiprocessing as mp import time import logging  logger=mp.log_to_stderr(logging.DEBUG)  def Process(x):     try:         logger.info(x)         time.sleep(random.random())         raise Exception('Exception: ' + x)     finally:         logger.info('Finally: ' + x)  result=mp.Pool(3).map(Process, ['1','2','3']) 

The logging output includes:

[DEBUG/MainProcess] terminating workers 

Which corresponds to this code in multiprocessing.pool._terminate_pool:

    if pool and hasattr(pool[0], 'terminate'):         debug('terminating workers')         for p in pool:             p.terminate() 

Each p in pool is a multiprocessing.Process, and calling terminate (at least on non-Windows machines) calls SIGTERM:

from multiprocessing/forking.py:

class Popen(object)     def terminate(self):         ...             try:                 os.kill(self.pid, signal.SIGTERM)             except OSError, e:                 if self.wait(timeout=0.1) is None:                     raise 

So it comes down to what happens when a Python process in a try suite is sent a SIGTERM.

Consider the following example (test.py):

import time     def worker():     try:         time.sleep(100)             finally:         print('enter finally')         time.sleep(2)          print('exit finally')     worker() 

If you run it, then send it a SIGTERM, then the process ends immediately, without entering the finally suite, as evidenced by no output, and no delay.

In one terminal:

% test.py 

In second terminal:

% pkill -TERM -f "test.py" 

Result in first terminal:

Terminated 

Compare that with what happens when the process is sent a SIGINT (C-c):

In second terminal:

% pkill -INT -f "test.py" 

Result in first terminal:

enter finally exit finally Traceback (most recent call last):   File "/home/unutbu/pybin/test.py", line 14, in <module>     worker()   File "/home/unutbu/pybin/test.py", line 8, in worker     time.sleep(100)         KeyboardInterrupt 

Conclusion: SIGTERM trumps finally.

like image 98
unutbu Avatar answered Sep 21 '22 21:09

unutbu