Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting better debug when Linux crashes in a C programme

We have an embedded version of Linux kernel running on a MIPs core. The Programme we have written runs a particular test suite. During one of the stress tests (runs for about 12hrs) we get a seg fault. This in turn generates a core dump.

Unfortunately the core dump is not very useful. The crash is in some system library that is dynamically linked (probably pthread or glibc). The backtrace in the core dump is not helpful because it only shows the crash point and no other callers (our user space app is built with -g -O0, but still no back trace info):

Cannot access memory at address 0x2aab1004
(gdb) bt
#0  0x2ab05d18 in ?? ()
warning: GDB can't find the start of the function at 0x2ab05d18.

    GDB is unable to find the start of the function at 0x2ab05d18
and thus can't determine the size of that function's stack frame.
This means that GDB may be unable to access that stack frame, or
the frames below it.
    This problem is most likely caused by an invalid program counter or
stack pointer.
    However, if you think GDB should simply search farther back
from 0x2ab05d18 for code which looks like the beginning of a
function, you can increase the range of the search using the `set
heuristic-fence-post' command.

Another unfortunate-ness is that we cannot run gdb/gdbserver. gdb/gdbserver keeps breaking on __nptl_create_event. Seeing that the test creates threads, timers and destroys then every 5s it is almost impossible to sit for a long time hitting continue on them.

EDIT: Another note, backtrace and backtrace_symbols is not supported on our toolchain.

Hence:

  1. Is there a way of trapping seg fault and generate more backtrace data, stack pointers, call stack, etc.?

  2. Is there a way of getting more data from a core dump that crashed in a .so file?

Thanks.

like image 663
user626201 Avatar asked Nov 24 '11 04:11

user626201


2 Answers

GDB can't find the start of the function at 0x2ab05d18

What is at that address at the time of the crash?

Do info shared, and find out if there is a library that contains that address.

The most likely cause of your troubles: did you run strip libpthread.so.0 before uploading it to your target? Don't do that: GDB requires libpthread.so.0 to not be stripped. If your toolchain contains libpthread.so.0 with debug symbols (and thus too large for the target), run strip -g on it, not a full strip.

Update:

info shared produced Cannot access memory at address 0x2ab05d18

This means that GDB can not access the shared library list (which would then explain the missing stack trace). The most usual cause: the binary that actually produced the core does not match the binary you gave to GDB. A less common cause: your core dump was truncated (perhaps due to ulimit -c being set too low).

like image 150
Employed Russian Avatar answered Nov 15 '22 09:11

Employed Russian


If all else fails run the command using the debugger!

Just put "gdb" in form of your normal start command and enter "c"ontinue to get the process running. When the task segfaults it will return to the interactive gdb prompt rather than core dump. You should then be able to get more meaningful stack traces etc.

Another option is to use "truss" if it is available. This will tell you which system calls were being used at the time of the abend.

like image 38
James Anderson Avatar answered Nov 15 '22 08:11

James Anderson