Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get proper backtrace in process signal handler (armv7-uclibc)?

Tags:

linux

arm

uclibc

I already did google many times to find right solution for backtrace() in signal handler and tried almost everything but I was not able to get the backtrace successfully in my signal handler - this is not SIGUSR1 handler.

  • enable UCLIBC_HAS_BACKTRACE=y in uclibc config and compiled it
  • verified that libubacktrace.so is created
  • compiled my application binaries with following options -g -rdynamic -fexception or -funwind-tables
  • The binary itself seems to be "stripped"

However, I was not able to get full backtrace from signal handler. Only function addresses which I've call in signal handler were printed.

If I use target-gdb binary and attach the process by using gdb --pid command, I was able to get the full backtrace properly.

Also, I tried pstack but (pstack-1.2 - tried arm-patch but it's horrible... nothing printed) not very helpful.

Any advice?


1) Compiler options in Makefile

CFLAGS += -g -fexceptions -funwind-tables -Werror $(WARN) ...

2) Code

The code is extremely simple.

#define CALLSTACK_SIZE 10

static void print_stack(void) {
    int i, nptrs;
    void *buf[CALLSTACK_SIZE + 1];
    char **strings;

    nptrs = backtrace(buf, CALLSTACK_SIZE);
    printf("%s: backtrace() returned %d addresses\n", __func__, nptrs);

    strings = backtrace_symbols(buf, nptrs);

    if(strings == NULL) {
        printf("%s: no backtrace captured\n", __func__);
        return;
    }

    for(i = 0; i < nptrs; i++) {
        printf("%s\n", strings[i]);
    }

    free(strings);
}

...
static void sigHandler(int signum)
{
    printf("%s: signal %d\n", __FUNCTION__, signum);
    switch(signum ) {
    case SIGUSR2:
        // told to quit
        print_stack();
        break;
    default:
        break;
    }
}
like image 563
user2526111 Avatar asked May 01 '15 06:05

user2526111


2 Answers

Read carefully signal(7) and signal-safety(7).

A signal handler is restricted to call (directly or indirectly) only async-signal-safe-functions (practically speaking, most syscalls(2) only) and backtrace(3) or even printf(3) or malloc(3) or free are not async-signal-safe. So your code is incorrect: the signal handler sigHandler is calling printf and indirectly (thru print_stack) free and they are not async-signal-safe.

So your only option is to use the gdb debugger.

Read more about POSIX signal.h & signal concepts. Practically speaking, the nearly only sensible thing a signal handler can do is set some global, thread-local, or static volatile sig_atomic_t flag, which has to be tested elsewhere. It could also directly write(2) a few bytes into a pipe(7), that your application would read elsewhere (e.g. in its event loop, if it is a GUI application).

You could also use Ian Taylor's libbacktrace from inside GCC (assuming your program is compiled with debug info, e.g. with -g). It is not guaranteed to work in signal handlers (since it is not using only async-signal-safe functions), but it is practically quite useful.

Notice that the kernel is setting a call frame (in the call stack) for sigreturn(2) when processing a signal.

You might also use (especially if your application is single-threaded) sigaltstack(2) to have an alternate signal stack. I'm not sure it would be helpful.

If you have an event loop, you might consider using the Linux specific signalfd(2) and ask your event loop to poll it. For SIGTERM or SIGQUIT or SIGALRM it is a quite useful trick.

like image 118
Basile Starynkevitch Avatar answered Oct 05 '22 10:10

Basile Starynkevitch


I would like to add something to @Basile Starynkevitch's answer, which is overly pedantic. While it's true that your signal handler isn't async-signal-safe, there's a good chance it will often work on Linux, so if you are seeing results being printed out, that isn't what's causing your issue of not seeing relevant stack information.

Some more likely problems include:

  1. Incorrect compiler flags for your platform. backtraces often work fine on x86 without special flags, but ARM can be more finicky. There are a few that I've tried that I can't remember, but the most important ones to try are -fno-omit-frame-pointer and -fasynchronous-unwind-tables.

  2. The code that's crashing was called through code that wasn't compiled with correct flags for getting stack traces. For example, stack traces that originate in code that calls back from a .so that wasn't compiled with correct compiler flags will often result in duplicate or truncated backtraces.

  3. The signal that you are getting the backtrace for is not a thread-directed signal, but a process-directed one. Practically speaking a thread-directed signal is one like SIGSEGV when the thread crashes, or one that another thread sends a specific thread with something like pthread_kill. See man 7 signal for more information.

With that out of the way, I would like to address what you can be doing in your signal handler to get backtraces. It is true that you shouldn't be calling any stdio functions, malloc(), free(), etc., but it is not true that you can't call backtrace with a sane version of glibc/libgcc. From here, you can see that backtrace_symbols_fd is currently async-signal-safe. You can also see that backtrace is not. It looks very unsafe. However, man 3 backtrace tell us why these restrictions apply:

backtrace_symbols_fd() does not call malloc(3), and so can be employed in situations where the latter function might fail, but see NOTES.

Later:

backtrace() and backtrace_symbols_fd() don't call malloc() explicitly, but they are part of libgcc, which gets loaded dynamically when first used. Dynamic loading usually triggers a call to malloc(3). If you need certain calls to these two functions to not allocate memory (in signal handlers, for example), you need to make sure libgcc is loaded beforehand.

A quick look at the source for backrace confirms that the unsafe parts involve dynamically loading libgcc. You could get around this by statically linking both glibc and libgcc, but the most robust way of doing it is by making sure that libgcc is loaded before any signals are generated.

The way I do this is by calling backtrace once during program startup. Note that you must ask for at least one symbol, or the function early-outs without loading libgcc. Something like this would work:

// On linux, especially on ARM, you want to use the sigaction version of this call.
// See my comments below.
static void
handle_signal(int sig)
{
    // Check signal type or whatever you want to do.
    // ...
    
    void* symbols[100];
    int n = backtrace(symbols, 100);
    
    // You could also either call a string formatting routine that you know
    // is async-signal-safe or save your backtrace and let another thread know
    // that this thread has crashed and the backtrace needs to be printed.
    //
    write(STDERR_FILENO, "Crash:\n", 7);
    backtrace_symbols_fd(symbols, n, STDERR_FILENO);

    // In the case of notifying another thread, which is what I do, you would
    // do something like this:
    //
    // threadLocalSymbolCount = backtrace(threadLocalSymbols, 100);
    // sem_post() or write() to an eventfd or whatever.
}

int main(int argc, char** argv)
{
    void* dummy = NULL;
    backtrace(&dummy, 1);
    
    // Setup custom signal handling
    // ...

    function_that_crashes();

    return 0;
}

EDIT: The OP mentions that they are using uclibc instead of glibc, but the same arguments apply, since it loads libgcc dynamically to get backtraces as well. An interesting point is that the source for uclibc's bactrace mentions that -fasynchronous-unwind-tables is necessary.

NOTE: I was planning on writing a full working code example, but I remembered that you have to use the sigaction version of signal handling and do something special to get stack traces on ARM. I have code that does it at work, and I will edit this post once I have it.

like image 34
rationalcoder Avatar answered Oct 05 '22 08:10

rationalcoder