Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python multiprocessing pool hangs at join?

I'm trying to run some python code on several files in parallel. The construct is basically:

def process_file(filename, foo, bar, baz=biz):     # do stuff that may fail and cause exception  if __name__ == '__main__':     # setup code setting parameters foo, bar, and biz      psize = multiprocessing.cpu_count()*2     pool = multiprocessing.Pool(processes=psize)      map(lambda x: pool.apply_async(process_file, (x, foo, bar), dict(baz=biz)), sys.argv[1:])     pool.close()     pool.join() 

I've previously used pool.map to do something similar and it worked great, but I can't seem to use that here because pool.map doesn't (appear to) allow me to pass in extra arguments (and using lambda to do it won't work because lambda can't be marshalled).

So now I'm trying to get things to work using apply_async() directly. My issue is that the code seems to hang and never exit. A few of the files fail with an exception, but i don't see why what would cause join to fail/hang? Interestingly if none of the files fail with an exception, it does exit cleanly.

What am I missing?

Edit: When the function (and thus a worker) fails, I see this exception:

Exception in thread Thread-3: Traceback (most recent call last):   File "/usr/lib/python2.7/threading.py", line 552, in __bootstrap_inner     self.run()   File "/usr/lib/python2.7/threading.py", line 505, in run     self.__target(*self.__args, **self.__kwargs)   File "/usr/lib/python2.7/multiprocessing/pool.py", line 376, in _handle_results     task = get() TypeError: ('__init__() takes at least 3 arguments (1 given)', <class 'subprocess.CalledProcessError'>, ()) 

If i see even one of these, the process parent process hangs forever, never reaping the children and exiting.

like image 339
clemej Avatar asked Mar 09 '13 18:03

clemej


Video Answer


2 Answers

Sorry to answer my own question, but I've found at least a workaround so in case anyone else has a similar issue I want to post it here. I'll accept any better answers out there.

I believe the root of the issue is http://bugs.python.org/issue9400 . This tells me two things:

  • I'm not crazy, what I'm trying to do really is supposed to work
  • At least in python2, it is very difficult if not impossible to pickle 'exceptions' back to the parent process. Simple ones work, but many others don't.

In my case, my worker function was launching a subprocess that was segfaulting. This returned CalledProcessError exception, which is not pickleable. For some reason, this makes the pool object in the parent go out to lunch and not return from the call to join().

In my particular case, I don't care what the exception was. At most I want to log it and keep going. To do this, I simply wrap my top worker function in a try/except clause. If the worker throws any exception, it is caught before trying to return to the parent process, logged, and then the worker process exits normally since it's no longer trying to send the exception through. See below:

def process_file_wrapped(filenamen, foo, bar, baz=biz):     try:         process_file(filename, foo, bar, baz=biz)     except:         print('%s: %s' % (filename, traceback.format_exc())) 

Then, I have my initial map function call process_file_wrapped() instead of the original one. Now my code works as intended.

like image 120
clemej Avatar answered Sep 29 '22 10:09

clemej


You can actually use a functools.partial instance instead of a lambda in cases where the object needs to be pickled. partial objects are pickleable since Python 2.7 (and in Python 3).

pool.map(functools.partial(process_file, x, foo, bar, baz=biz), sys.argv[1:]) 
like image 23
nneonneo Avatar answered Sep 29 '22 10:09

nneonneo