Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Recover from segfault in Python

I have a few functions in my code that are randomly causing SegmentationFault error. I've identified them by enabling the faulthandler. I'm a bit stuck and have no idea how to reliably eliminate this problem.

I'm thinking about some workaround. Since the functions are crashing randomly, I could potentially retry them after a failure. The problem is that there's no way to recover from SegmentationFault crash.
The best idea I have for now is to rewrite these functions a bit and run them via subprocess. This solution will help me, that a crashed function won't crash the whole application, and can be retried.

Some of the functions are quite small and often executed, so it will significantly slow down my app. Is there any method to execute function in a separate context, faster than a subprocess that won't crash whole program in case of segfault?

like image 705
Djent Avatar asked Nov 02 '20 11:11

Djent


People also ask

Can you recover from a segfault?

On both Windows and Linux, the segfault handler function is passed a "context struct", which includes the state of the registers at the failure site. Ostensibly, this is so people can repair the problem that caused the segfault (it also lets you do nifty things like userspace segment handling).

How do you find the cause of segfault?

Check shell limits Usually it is the limit on stack size that causes this kind of problem. To check memory limits, use the ulimit command in bash or ksh , or the limit command in csh or tcsh . Try setting the stacksize higher, and then re-run your program to see if the segfault goes away.

What happens segfault?

A segmentation fault occurs when a program attempts to access a memory location that it is not allowed to access, or attempts to access a memory location in a way that is not allowed (for example, attempting to write to a read-only location, or to overwrite part of the operating system).


2 Answers

tl;dr: You can write C code using signal, setjmp, longjmp.


You have multiple choices to handle SIGSEGV:

  • spawing sub process using subprocess library
  • forking using multiprocessing library
  • writing custom signal handler

Subprocess and fork have already been describe, so I will focus on signal handler point of view.

Writing signal handler

From a kernel perspective, there is no difference between SIGSEGV and any other signals like SIGUSR1, SIGQUIT, SIGINT, etc. In fact, some libraries (like JVM) use them as way of communication.

Unfortunately you can't override signal handler from python code. See doc:

It makes little sense to catch synchronous errors like SIGFPE or SIGSEGV that are caused by an invalid operation in C code. Python will return from the signal handler to the C code, which is likely to raise the same signal again, causing Python to apparently hang. From Python 3.3 onwards, you can use the faulthandler module to report on synchronous errors.

This mean, error management should be done in C code.

You can write custom signal handler and use setjmp and longjmp to save and restore stack context.

For example, here is a simple CPython C extension:

#include <signal.h>
#include <setjmp.h>

#define PY_SSIZE_T_CLEAN
#include <Python.h>

static jmp_buf jmpctx;

void handle_segv(int signo)
{
    longjmp(jmpctx, 1);
}

static PyObject *
install_sig_handler(PyObject *self, PyObject *args)
{
    signal(SIGSEGV, handle_segv);
    Py_RETURN_TRUE;
}

static PyObject *
trigger_segfault(PyObject *self, PyObject *args)
{
    if (!setjmp(jmpctx))
    {
        // Assign a value to NULL pointer will trigger a seg fault
        int *x = NULL;
        *x = 42;

        Py_RETURN_TRUE; // Will never be called
    }

    Py_RETURN_FALSE;
}

static PyMethodDef SpamMethods[] = {
    {"install_sig_handler", install_sig_handler, METH_VARARGS, "Install SIGSEGV handler"},
    {"trigger_segfault", trigger_segfault, METH_VARARGS, "Trigger a segfault"},
    {NULL, NULL, 0, NULL},
};

static struct PyModuleDef spammodule = {
    PyModuleDef_HEAD_INIT,
    "crash",
    "Crash and recover",
    -1,
    SpamMethods,
};

PyMODINIT_FUNC
PyInit_crash(void)
{
    return PyModule_Create(&spammodule);
}

And the caller app:

import crash

print("Install custom sighandler")
crash.install_sig_handler()

print("bad_func: before")
retval = crash.trigger_segfault()
print("bad_func: after (retval:", retval, ")")

This will produces following output:

Install custom sighandler
bad_func: before
bad_func: after (retval: False )

Pros and cons

Pros:

  • From an OS perspective, the app just catch SIGSEGV as a regular signal. Error handling will be fast.
  • It does not need forking (not always possible if your app hold various kind of file descriptor, socket, ...)
  • It does not need spawning sub processes (not always possible and much slower method).

Cons:

  • Might cause memory leak.
  • Might hide undefined / dangerous behavior

Keep in mind that segmentation fault is a really serious error ! Always try to first fix it instead of hiding it.

Few links and references:

  • https://docs.python.org/3/library/signal.html#execution-of-python-signal-handlers
  • How to write a signal handler to catch SIGSEGV?
  • https://docs.python.org/3/extending/extending.html#extending-python-with-c-or-c
  • https://www.cplusplus.com/reference/csetjmp/setjmp/
  • https://www.cplusplus.com/reference/csetjmp/longjmp/
like image 42
arthurlm Avatar answered Oct 01 '22 02:10

arthurlm


I had some unreliable C extensions throw segfaults every once in a while and, since there was no way I was going to be able to fix that, what I did was create a decorator that would run the wrapped function in a separate process. That way you can stop segfaults from killing the main process.

Something like this: https://gist.github.com/joezuntz/e7e7764e5b591ed519cfd488e20311f1

Mine was a bit simpler, and it did the job for me. Additionally it lets you choose a timeout and a default return value in case there was a problem:

#! /usr/bin/env python3

# std imports
import multiprocessing as mp


def parametrized(dec):
    """This decorator can be used to create other decorators that accept arguments"""

    def layer(*args, **kwargs):
        def repl(f):
            return dec(f, *args, **kwargs)

        return repl

    return layer


@parametrized
def sigsev_guard(fcn, default_value=None, timeout=None):
    """Used as a decorator with arguments.
    The decorated function will be called with its input arguments in another process.

    If the execution lasts longer than *timeout* seconds, it will be considered failed.

    If the execution fails, *default_value* will be returned.
    """

    def _fcn_wrapper(*args, **kwargs):
        q = mp.Queue()
        p = mp.Process(target=lambda q: q.put(fcn(*args, **kwargs)), args=(q,))
        p.start()
        p.join(timeout=timeout)
        exit_code = p.exitcode

        if exit_code == 0:
            return q.get()

        logging.warning('Process did not exit correctly. Exit code: {}'.format(exit_code))
        return default_value

    return _fcn_wrapper

So you would use it like:


@sigsev_guard(default_value=-1, timeout=60)
def your_risky_function(a,b,c,d):
    ...

like image 158
imochoa Avatar answered Oct 01 '22 02:10

imochoa