I have a few functions in my code that are randomly causing SegmentationFault
error. I've identified them by enabling the faulthandler
. I'm a bit stuck and have no idea how to reliably eliminate this problem.
I'm thinking about some workaround. Since the functions are crashing randomly, I could potentially retry them after a failure. The problem is that there's no way to recover from SegmentationFault
crash.
The best idea I have for now is to rewrite these functions a bit and run them via subprocess. This solution will help me, that a crashed function won't crash the whole application, and can be retried.
Some of the functions are quite small and often executed, so it will significantly slow down my app. Is there any method to execute function in a separate context, faster than a subprocess that won't crash whole program in case of segfault?
On both Windows and Linux, the segfault handler function is passed a "context struct", which includes the state of the registers at the failure site. Ostensibly, this is so people can repair the problem that caused the segfault (it also lets you do nifty things like userspace segment handling).
Check shell limits Usually it is the limit on stack size that causes this kind of problem. To check memory limits, use the ulimit command in bash or ksh , or the limit command in csh or tcsh . Try setting the stacksize higher, and then re-run your program to see if the segfault goes away.
A segmentation fault occurs when a program attempts to access a memory location that it is not allowed to access, or attempts to access a memory location in a way that is not allowed (for example, attempting to write to a read-only location, or to overwrite part of the operating system).
tl;dr: You can write C code using signal
, setjmp
, longjmp
.
You have multiple choices to handle SIGSEGV
:
subprocess
librarymultiprocessing
librarySubprocess and fork have already been describe, so I will focus on signal handler point of view.
From a kernel perspective, there is no difference between SIGSEGV
and any other signals like SIGUSR1
, SIGQUIT
, SIGINT
, etc.
In fact, some libraries (like JVM) use them as way of communication.
Unfortunately you can't override signal handler from python code. See doc:
It makes little sense to catch synchronous errors like SIGFPE or SIGSEGV that are caused by an invalid operation in C code. Python will return from the signal handler to the C code, which is likely to raise the same signal again, causing Python to apparently hang. From Python 3.3 onwards, you can use the faulthandler module to report on synchronous errors.
This mean, error management should be done in C code.
You can write custom signal handler and use setjmp
and longjmp
to save and restore stack context.
For example, here is a simple CPython C extension:
#include <signal.h>
#include <setjmp.h>
#define PY_SSIZE_T_CLEAN
#include <Python.h>
static jmp_buf jmpctx;
void handle_segv(int signo)
{
longjmp(jmpctx, 1);
}
static PyObject *
install_sig_handler(PyObject *self, PyObject *args)
{
signal(SIGSEGV, handle_segv);
Py_RETURN_TRUE;
}
static PyObject *
trigger_segfault(PyObject *self, PyObject *args)
{
if (!setjmp(jmpctx))
{
// Assign a value to NULL pointer will trigger a seg fault
int *x = NULL;
*x = 42;
Py_RETURN_TRUE; // Will never be called
}
Py_RETURN_FALSE;
}
static PyMethodDef SpamMethods[] = {
{"install_sig_handler", install_sig_handler, METH_VARARGS, "Install SIGSEGV handler"},
{"trigger_segfault", trigger_segfault, METH_VARARGS, "Trigger a segfault"},
{NULL, NULL, 0, NULL},
};
static struct PyModuleDef spammodule = {
PyModuleDef_HEAD_INIT,
"crash",
"Crash and recover",
-1,
SpamMethods,
};
PyMODINIT_FUNC
PyInit_crash(void)
{
return PyModule_Create(&spammodule);
}
And the caller app:
import crash
print("Install custom sighandler")
crash.install_sig_handler()
print("bad_func: before")
retval = crash.trigger_segfault()
print("bad_func: after (retval:", retval, ")")
This will produces following output:
Install custom sighandler
bad_func: before
bad_func: after (retval: False )
Pros:
SIGSEGV
as a regular signal. Error handling will be fast.Cons:
Keep in mind that segmentation fault is a really serious error ! Always try to first fix it instead of hiding it.
I had some unreliable C extensions throw segfaults every once in a while and, since there was no way I was going to be able to fix that, what I did was create a decorator that would run the wrapped function in a separate process. That way you can stop segfaults from killing the main process.
Something like this: https://gist.github.com/joezuntz/e7e7764e5b591ed519cfd488e20311f1
Mine was a bit simpler, and it did the job for me. Additionally it lets you choose a timeout and a default return value in case there was a problem:
#! /usr/bin/env python3
# std imports
import multiprocessing as mp
def parametrized(dec):
"""This decorator can be used to create other decorators that accept arguments"""
def layer(*args, **kwargs):
def repl(f):
return dec(f, *args, **kwargs)
return repl
return layer
@parametrized
def sigsev_guard(fcn, default_value=None, timeout=None):
"""Used as a decorator with arguments.
The decorated function will be called with its input arguments in another process.
If the execution lasts longer than *timeout* seconds, it will be considered failed.
If the execution fails, *default_value* will be returned.
"""
def _fcn_wrapper(*args, **kwargs):
q = mp.Queue()
p = mp.Process(target=lambda q: q.put(fcn(*args, **kwargs)), args=(q,))
p.start()
p.join(timeout=timeout)
exit_code = p.exitcode
if exit_code == 0:
return q.get()
logging.warning('Process did not exit correctly. Exit code: {}'.format(exit_code))
return default_value
return _fcn_wrapper
So you would use it like:
@sigsev_guard(default_value=-1, timeout=60)
def your_risky_function(a,b,c,d):
...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With