Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to increase probability of Linux core dumps matching symbols?

I have a very complex cross-platform application. Recently my team and I have been running stress tests and have encountered several crashes (and core dumps accompanying them). Some of these core dumps are very precise, and show me the exact location where the crash occurred with around 10 or more stack frames. Others sometimes have just one stack frame with ?? being the only symbol!

What I'd like to know is:

  1. Is there a way to increase the probability of core dumps pointing in the right direction?
  2. Why isn't the number of stack frames reported consistent?
  3. Any best practice advise for managing core dumps.

Here's how I compile the binaries (in release mode):

  1. Compiler and platform: g++ with glibc-2.3.2-95.50 on CentOS 3.6 x86_64 -- This helps me maintain compatibility with older versions of Linux.
  2. All files are compiled with the -g flag.
  3. Debug symbols are stripped from the final binary and saved in a separate file.
  4. When I have a core dump, I use GDB with the executable which created the core, and the symbols file. GDB never complains that there's a mismatch between the core/binary/symbols.

Yet I sometimes get core dumps with no symbols at all! It's understandable that I'm linking against non-debug version of libstdc++ and libgcc, but it would be nice if at least the stack trace shows me where in my code did the faulty instruction call originate (although it may ultimately end in ??).

like image 529
themoondothshine Avatar asked Jan 06 '11 10:01

themoondothshine


People also ask

Which Linux command helps you set the maximum size of core dump created?

At your Unix/Linux command prompt, type "limit". The ulimit command has a different syntax. For more details type "man bash" (or ksh or sh) and search for "ulimit".

How big is a core dump?

The maximum size of the resulting core filename is 128 bytes (64 bytes in kernels before 2.6. 19). The default value in this file is "core".


2 Answers

Others sometimes have just one stack frame with "??" being the only symbol!

There can be many reasons for that, among others:

  • the stack frame was trashed (overwritten)
  • EBP/RBP (on x86/x64) is currently not holding any meaningful value — this can happen e.g. in units compiled with -fomit-frame-pointer or asm units that do so

Note that the second point may occur simply by, for example, glibc being compiled in such a way. Having the debug info for such system libraries installed could mitigate this (something like what the glibc-debug{info,source} packages are on openSUSE).

gdb has more control over the program than glibc, so glibc's backtrace call would naturally be unable to print a backtrace if gdb cannot do so either.

But shipping the source would be much easier :-)

like image 155
user562374 Avatar answered Oct 05 '22 17:10

user562374


As an alternative, on a glibc system, you could use the backtrace function call (or backtrace_symbols or backtrace_symbols_fd) and filter out the results yourself, so only the symbols belonging to your own code are displayed. It's a bit more work, but then, you can really tailor it to your needs.

like image 42
whoami Avatar answered Oct 05 '22 18:10

whoami