How to insulate a job/thread from crashes

Question

I'm working on a library where I'm farming various tasks out to some third-party libraries that do some relatively sketchy or dangerous platform-specific work. (In specific, I'm writing a mathematical function parser that calls JIT-compilers, like LLVM or libjit, to build machine code.) In practice, these third-party libraries have a tendency to be crashy (part of this is my fault, of course, but I still want some insurance).

I'd like, then, to be able to very gracefully deal with a job dying horribly -- SIGSEGV, SIGILL, etc. -- without bringing down the rest of my code (or the code of the users calling my library functions). To be clear, I don't care if that particular job can continue (I'm not going to try to repair a crash condition), nor do I really care about the state of the objects after such a crash (I'll discard them immediately if there's a crash). I just want to be able to detect that a crash has occurred, stop the crash from taking out the entire process, stop calling whatever's crashing, and resume execution.

(For a little more context, the code at the moment is a for loop, testing each of the available JIT-compilers. Some of these compilers might crash. If they do, I just want to execute continue; and get on with testing another compiler.)

Currently, I've got a signal()-based implementation that fails pretty horribly; of course, it's undefined behavior to longjmp() out of a signal handler, and signal handlers are pretty much expected to end with exit() or terminate(). Just throwing the code in another thread doesn't help by itself, at least the way I've tested it so far. I also can't hack out a way to make this work using C++ exceptions.

So, what's the best way to insulate a particular set of instructions / thread / job from crashes?

alex · Accepted Answer

Spawn a new process.

morechilli · Answer

What output do you collect when a job succeeds?

I ask because if the output is low bandwidth I would be tempted to run each job in its own process.

Each of these crashy jobs you fire up has a high chance of corrupting memory used elsewhere in your process.

Processes offer the best protection.

John Dibling · Answer

Processes offer the best protection, but it's possible you can't do that.

If your threads' entry points are functions you wrote, (for example, ThreadProc in the Windows world), then you can wrap them in try{...}catch(...) blocks. If you want to communicate that an exception has occurred, then you can communicate specific error codes back to the main thread or use some other mechanism. If you want to log not only that an exception has occured but what that exception was, then you'll need to catch specific exception types and extract diagnostic information from them to communicate back to the main thread. A'la:

int my_tempermental_thread()
{
  try
  {
    // ... magic happens ...
    return 0;
  }
  catch( const std::exception& ex )
  {
    // ... or maybe it doesn't ...
    string reason = ex.what();
    tell_main_thread_what_went_wong(reason);
    return 1;
  }
  catch( ... )
  {
    // ... definitely not magical happenings here ...
    tell_main_thread_what_went_wrong("uh, something bad and undefined");
    return 2;
  }
}

Be aware that if you go this way you run the risk of hosing the host process when the exceptions do occur. You say you're not trying to correct the problem, but how do you know the malignant thread didn't eat your stack for example? Catch-and-ignore is a great way to create horribly confounding bugs.

How to insulate a job/thread from crashes

Tags:

c++

multithreading

crash

signals

setjmp

Charles Pence

3 Answers

alex

morechilli

John Dibling

Recent Activity

Donate For Us

How to insulate a job/thread from crashes

Tags:

c++

multithreading

crash

signals

setjmp

Charles Pence

3 Answers

alex

morechilli

John Dibling

Related questions

Recent Activity

Donate For Us