I'm using Python 2.7.3. I have parallelised some code using subclassed multiprocessing.Process objects. If there are no errors in the code in my subclassed Process objects, everything runs fine. But if there are errors in the code in my subclassed Process objects, they will apparently crash silently (no stacktrace printed to the parent shell) and CPU usage will drop to zero. The parent code never crashes, giving the impression that execution is just hanging. Meanwhile it's really difficult to track down where the error in the code is because no indication is given as to where the error is.
I can't find any other questions on stackoverflow that deal with the same problem.
I guess the subclassed Process objects appear to crash silently because they can't print an error message to the parent's shell, but I would like to know what I can do about it so that I can at least debug more efficiently (and so that other users of my code can tell me when they run into problems too).
EDIT: my actual code is too complex, but a trivial example of a subclassed Process object with an error in it would be something like this:
from multiprocessing import Process, Queue
class Worker(Process):
    def __init__(self, inputQueue, outputQueue):
        super(Worker, self).__init__()
        self.inputQueue = inputQueue
        self.outputQueue = outputQueue
    def run(self):
        for i in iter(self.inputQueue.get, 'STOP'):
            # (code that does stuff)
            1 / 0 # Dumb error
            # (more code that does stuff)
            self.outputQueue.put(result)
What you really want is some way to pass exceptions up to the parent process, right? Then you can handle them however you want.
If you use concurrent.futures.ProcessPoolExecutor, this is automatic. If you use multiprocessing.Pool, it's trivial. If you use explicit Process and Queue, you have to do a bit of work, but it's not that much.
For example:
def run(self):
    try:
        for i in iter(self.inputQueue.get, 'STOP'):
            # (code that does stuff)
            1 / 0 # Dumb error
            # (more code that does stuff)
            self.outputQueue.put(result)
    except Exception as e:
        self.outputQueue.put(e)
Then, your calling code can just read Exceptions off the queue like anything else. Instead of this:
yield outq.pop()
do this:
result = outq.pop()
if isinstance(result, Exception):
    raise result
yield result
(I don't know what your actual parent-process queue-reading code does, because your minimal sample just ignores the queue. But hopefully this explains the idea, even though your real code doesn't actually work like this.)
This assumes that you want to abort on any unhandled exception that makes it up to run. If you want to pass back the exception and continue on to the next i in iter, just move the try into the for, instead of around it.
This also assumes that Exceptions are not valid values. If that's an issue, the simplest solution is to just push (result, exception) tuples:
def run(self):
    try:
        for i in iter(self.inputQueue.get, 'STOP'):
            # (code that does stuff)
            1 / 0 # Dumb error
            # (more code that does stuff)
            self.outputQueue.put((result, None))
    except Exception as e:
        self.outputQueue.put((None, e))
Then, your popping code does this:
result, exception = outq.pop()
if exception:
    raise exception
yield result
You may notice that this is similar to the node.js callback style, where you pass (err, result) to every callback. Yes, it's annoying, and you're going to mess up code in that style. But you're not actually using that anywhere except in the wrapper; all of your "application-level" code that gets values off the queue or gets called inside run just sees normal returns/yields and raised exceptions.
You may even want to consider building a Future to the spec of concurrent.futures (or using that class as-is), even though you're doing your job queuing and executing manually. It's not that hard, and it gives you a very nice API, especially for debugging.
Finally, it's worth noting that most code built around workers and queues can be made a lot simpler with an executor/pool design, even if you're absolutely sure you only want one worker per queue. Just scrap all the boilerplate, and turn the loop in the Worker.run method into a function (which just returns or raises as normal, instead of appending to a queue). On the calling side, again scrap all the boilerplate and just submit or map the job function with its parameters.
Your whole example can be reduced to:
def job(i):
    # (code that does stuff)
    1 / 0 # Dumb error
    # (more code that does stuff)
    return result
with concurrent.futures.ProcessPoolExecutor(max_workers=1) as executor:
    results = executor.map(job, range(10))
And it'll automatically handle exceptions properly.
As you mentioned in the comments, the traceback for an exception doesn't trace back into the child process; it only goes as far as the manual raise result call (or, if you're using a pool or executor, the guts of the pool or executor).
The reason is that multiprocessing.Queue is built on top of pickle, and pickling exceptions doesn't pickle their tracebacks. And the reason for that is that you can't pickle tracebacks. And the reason for that is that tracebacks are full of references to the local execution context, so making them work in another process would be very hard.
So… what can you do about this? Don't go looking for a fully general solution. Instead, think about what you actually need. 90% of the time, what you want is "log the exception, with traceback, and continue" or "print the exception, with traceback, to stderr and exit(1) like the default unhandled-exception handler". For either of those, you don't need to pass an exception at all; just format it on the child side and pass a string over. If you do need something more fancy, work out exactly what you need, and pass just enough information to manually put that together. If you don't know how to format tracebacks and exceptions, see the traceback module. It's pretty simple. And this means you don't need to get into the pickle machinery at all. (Not that it's very hard to copyreg a pickler or write a holder class with a __reduce__ method or anything, but if you don't need to, why learn all that?)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With