We get core files from running our software on a Customer's box. Unfortunately because we've always compiled with -O2 without debugging symbols this has lead to situations where we could not figure out why it was crashing, we've modified the builds so now they generate -g and -O2 together. We then advice the Customer to run a -g binary so it becomes easier to debug. I have a few questions: <ol> <li>What happens when a core file is generated from a Linux distro other than the one we are running in Dev? Is the stack trace even meaningful?</li> <li>Are there any good books for debugging on Linux, or Solaris? Something example oriented would be great. I am looking for real-life examples of figuring out why a routine crashed and how the author arrived at a solution. Something more on the intermediate to advanced level would be good, as I have been doing this for a while now. Some assembly would be good as well.</li> </ol> Here's an example of a crash that requires us to tell the Customer to get a -g ver. of the binary: <pre class="prettyprint"><code>Program terminated with signal 11, Segmentation fault. #0 0xffffe410 in __kernel_vsyscall () (gdb) where #0 0xffffe410 in __kernel_vsyscall () #1 0x00454ff1 in select () from /lib/libc.so.6 ... <omitted frames> </code></pre> Ideally I'd like to solve find out why exactly the app crashed - I suspect it's memory corruption but I am not 100% sure. Remote debugging is strictly not allowed. Thanks

<blockquote> What happens when a core file is generated from a Linux distro other than the one we are running in Dev? Is the stack trace even meaningful? </blockquote> It the executable is dynamically linked, as yours is, the stack GDB produces will (most likely) not be meaningful. The reason: GDB knows that your executable crashed by calling something in <code>libc.so.6</code> at address <code>0x00454ff1</code>, but it doesn't know what code was at that address. So it looks into your copy of <code>libc.so.6</code> and discovers that this is in <code>select</code>, so it prints that. But the chances that <code>0x00454ff1</code> is also in select in your customers copy of <code>libc.so.6</code> are quite small. Most likely the customer had some other procedure at that address, perhaps <code>abort</code>. You can use <code>disas select</code>, and observe that <code>0x00454ff1</code> is either in the middle of instruction, or that the previous instruction is not a <code>CALL</code>. If either of these holds, your stack trace is meaningless. You can however help yourself: you just need to get a copy of all libraries that are listed in <code>(gdb) info shared</code> from the customer system. Have the customer tar them up with e.g. <pre class="prettyprint"><code>cd / tar cvzf to-you.tar.gz lib/libc.so.6 lib/ld-linux.so.2 ... </code></pre> Then, on your system: <pre class="prettyprint"><code>mkdir /tmp/from-customer tar xzf to-you.tar.gz -C /tmp/from-customer gdb /path/to/binary (gdb) set solib-absolute-prefix /tmp/from-customer (gdb) core core # Note: very important to set solib-... before loading core (gdb) where # Get meaningful stack trace! </code></pre> <blockquote> We then advice the Customer to run a -g binary so it becomes easier to debug. </blockquote> A much better approach is: <ul> <li>build with <code>-g -O2 -o myexe.dbg</code> </li> <li><code>strip -g myexe.dbg -o myexe</code></li> <li>distribute <code>myexe</code> to customers</li> <li>when a customer gets a <code>core</code>, use <code>myexe.dbg</code> to debug it</li> </ul> You'll have full symbolic info (file/line, local variables), without having to ship a special binary to the customer, and without revealing too many details about your sources.

Debugging core files generated on a Customer's box

We get core files from running our software on a Customer's box. Unfortunately because we've always compiled with -O2 without debugging symbols this has lead to situations where we could not figure out why it was crashing, we've modified the builds so now they generate -g and -O2 together. We then advice the Customer to run a -g binary so it becomes easier to debug.

I have a few questions:

What happens when a core file is generated from a Linux distro other than the one we are running in Dev? Is the stack trace even meaningful?
Are there any good books for debugging on Linux, or Solaris? Something example oriented would be great. I am looking for real-life examples of figuring out why a routine crashed and how the author arrived at a solution. Something more on the intermediate to advanced level would be good, as I have been doing this for a while now. Some assembly would be good as well.

Here's an example of a crash that requires us to tell the Customer to get a -g ver. of the binary:

Program terminated with signal 11, Segmentation fault.
#0  0xffffe410 in __kernel_vsyscall ()
(gdb) where
#0  0xffffe410 in __kernel_vsyscall ()
#1  0x00454ff1 in select () from /lib/libc.so.6
...
<omitted frames>

Ideally I'd like to solve find out why exactly the app crashed - I suspect it's memory corruption but I am not 100% sure.

Remote debugging is strictly not allowed.

Thanks

How do I debug a core file?

You can also right-click the file, then select Debug As > C/C++ QNX Local Core Dump Debugging. In the previous IDE version, you had to right-click the binary executable, choose Debug As > Debug Configurations, then provide the path of the core file in a new debug configuration.

What is core debugging?

Core dump debugging is useful when you have a failing application in a production environment, and you don't have the COBOL development system installed. As long as you have access to the application's source files elsewhere, you can produce a core file and debug it.

How are core files generated?

Core dumps are generated when the process receives certain signals, such as SIGSEGV, which the kernels sends it when it accesses memory outside its address space. Typically that happens because of errors in how pointers are used.

Where are core files stored?

By default, all core dumps are stored in /var/lib/systemd/coredump (due to Storage=external ) and they are compressed with zstd (due to Compress=yes ). Additionally, various size limits for the storage can be configured. Note: The default value for kernel. core_pattern is set in /usr/lib/sysctl.

What happens when a core file is generated from a Linux distro other than the one we are running in Dev? Is the stack trace even meaningful?

It the executable is dynamically linked, as yours is, the stack GDB produces will (most likely) not be meaningful.

The reason: GDB knows that your executable crashed by calling something in libc.so.6 at address 0x00454ff1, but it doesn't know what code was at that address. So it looks into your copy of libc.so.6 and discovers that this is in select, so it prints that.

But the chances that 0x00454ff1 is also in select in your customers copy of libc.so.6 are quite small. Most likely the customer had some other procedure at that address, perhaps abort.

You can use disas select, and observe that 0x00454ff1 is either in the middle of instruction, or that the previous instruction is not a CALL. If either of these holds, your stack trace is meaningless.

You can however help yourself: you just need to get a copy of all libraries that are listed in (gdb) info shared from the customer system. Have the customer tar them up with e.g.

cd /
tar cvzf to-you.tar.gz lib/libc.so.6 lib/ld-linux.so.2 ...

Then, on your system:

mkdir /tmp/from-customer
tar xzf to-you.tar.gz -C /tmp/from-customer
gdb /path/to/binary
(gdb) set solib-absolute-prefix /tmp/from-customer
(gdb) core core  # Note: very important to set solib-... before loading core
(gdb) where      # Get meaningful stack trace!

We then advice the Customer to run a -g binary so it becomes easier to debug.

A much better approach is:

build with -g -O2 -o myexe.dbg
strip -g myexe.dbg -o myexe
distribute myexe to customers
when a customer gets a core, use myexe.dbg to debug it

You'll have full symbolic info (file/line, local variables), without having to ship a special binary to the customer, and without revealing too many details about your sources.

You can indeed get useful information from a crash dump, even one from an optimized compile (although it's what is called, technically, "a major pain in the ass.") a -g compile is indeed better, and yes, you can do so even when the machine on which the dump happened is another distribution. Basically, with one caveat, all the important information is contained in the executable and ends up in the dump.

When you match the core file with the executable, the debugger will be able to tell you where the crash occurred and show you the stack. That in itself should help a lot. You should also find out as much as you can about the situation in which it happens -- can they reproduce it reliably? If so, can you reproduce it?

Now, here's the caveat: the place where the notion of "everything is there" breaks down is with shared object files, .so files. If it is failing because of a problem with those, you won't have the symbol tables you need; you may only be able to see what library .so it happens in.

There are a number of books about debugging, but I can't think of one I'd recommend.

Debugging core files generated on a Customer's box

Tags:

c++

linux

debugging

gdb

Mohamed Bana

People also ask

2 Answers

Employed Russian

Charlie Martin

Recent Activity

Donate For Us

Debugging core files generated on a Customer's box

Tags:

c++

linux

debugging

gdb

Mohamed Bana

People also ask

2 Answers

Employed Russian

Charlie Martin

Related questions

Recent Activity

Donate For Us