I'm working on a library where I'm farming various tasks out to some third-party libraries that do some relatively sketchy or dangerous platform-specific work. (In specific, I'm writing a mathematical function parser that calls JIT-compilers, like LLVM or libjit, to build machine code.) In practice, these third-party libraries have a tendency to be crashy (part of this is my fault, of course, but I still want some insurance).
I'd like, then, to be able to very gracefully deal with a job dying horribly -- SIGSEGV, SIGILL, etc. -- without bringing down the rest of my code (or the code of the users calling my library functions). To be clear, I don't care if that particular job can continue (I'm not going to try to repair a crash condition), nor do I really care about the state of the objects after such a crash (I'll discard them immediately if there's a crash). I just want to be able to detect that a crash has occurred, stop the crash from taking out the entire process, stop calling whatever's crashing, and resume execution.
(For a little more context, the code at the moment is a for loop, testing each of the available JIT-compilers. Some of these compilers might crash. If they do, I just want to execute continue;
and get on with testing another compiler.)
Currently, I've got a signal()
-based implementation that fails pretty horribly; of course, it's undefined behavior to longjmp()
out of a signal handler, and signal handlers are pretty much expected to end with exit()
or terminate()
. Just throwing the code in another thread doesn't help by itself, at least the way I've tested it so far. I also can't hack out a way to make this work using C++ exceptions.
So, what's the best way to insulate a particular set of instructions / thread / job from crashes?
Spawn a new process.
What output do you collect when a job succeeds?
I ask because if the output is low bandwidth I would be tempted to run each job in its own process.
Each of these crashy jobs you fire up has a high chance of corrupting memory used elsewhere in your process.
Processes offer the best protection.
Processes offer the best protection, but it's possible you can't do that.
If your threads' entry points are functions you wrote, (for example, ThreadProc
in the Windows world), then you can wrap them in try{...}catch(...)
blocks. If you want to communicate that an exception has occurred, then you can communicate specific error codes back to the main thread or use some other mechanism. If you want to log not only that an exception has occured but what that exception was, then you'll need to catch specific exception types and extract diagnostic information from them to communicate back to the main thread. A'la:
int my_tempermental_thread()
{
try
{
// ... magic happens ...
return 0;
}
catch( const std::exception& ex )
{
// ... or maybe it doesn't ...
string reason = ex.what();
tell_main_thread_what_went_wong(reason);
return 1;
}
catch( ... )
{
// ... definitely not magical happenings here ...
tell_main_thread_what_went_wrong("uh, something bad and undefined");
return 2;
}
}
Be aware that if you go this way you run the risk of hosing the host process when the exceptions do occur. You say you're not trying to correct the problem, but how do you know the malignant thread didn't eat your stack for example? Catch-and-ignore is a great way to create horribly confounding bugs.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With