Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Correlate Source with Assembly Listing of a C++ Program

Analyzing Core Dump in retail build often requires to correlate the objdump of any specific module and the source. Normally correlating the assembly dump with the source becomes a pain if the function is quite involved. Today I tried to create an assembly listing of one particular module (with the compile option -S) expecting I would see an interleaving source with assembly or some correlation. Unfortunately the listing was not friendly enough to correlate so I was wondering

  • Given a core-dump from which I can determine the crash location
  • objdump of the failing module Assembly Listing by recompiling the
  • module with -S option.

Is it possible to do a one-to-one correspondence with the source?

As an example I see the assembly listing as

.LBE7923:
        .loc 2 4863 0
        movq    %rdi, %r14
        movl    %esi, %r12d
        movl    696(%rsp), %r15d
        movq    704(%rsp), %rbp
.LBB7924:
        .loc 2 4880 0
        testq   %rdx, %rdx
        je      .L2680
.LVL2123:
        testl   %ecx, %ecx
        jle     .L2680
        movslq  %ecx,%rax
        .loc 2 4882 0
        testl   %r15d, %r15d
        .loc 2 4880 0
        leaq    (%rax,%rax,4), %rax
        leaq    -40(%rdx,%rax,8), %rdx
        movq    %rdx, 64(%rsp)

but could not understand how to interpret the labels like .LVL2123 and directives like .loc 2 4863 0

Note As the answers depicted, reading through the assembly source and intuitively determining pattern based on symbols (like function calls, branches, return statement) is what I generally do. I am not denying that it doesn't work but when a function is quite involved, reading though pages of Assembly Listing is a pain and often you end up with listing that seldom match either because of functions getting in-lined or optimizer have simply tossed the code as it pleased. I have a feeling seeing how efficiently Valgrind handles optimized binaries and how in Windows WinDBG can handled optimized binaries, there is something I am missing. So I though I would start with the compiler output and use it to correlate. If my compiler is responsible for mangling the binary it would be the best person to say how to correlate with the source, but unfortunately that was least helpful and the .loc is really misleading. Unfortunately I often have to read through unreproducible dumps across various platforms and the least time I spend is in debugging Windows Mini-dumps though WinDBG and considerable time in debugging Linux Coredumps. I though that may be I am not doing things correctly so I came up with this question.

like image 488
Abhijit Avatar asked May 02 '12 12:05

Abhijit


3 Answers

Is it possible to do a one-to-one correspondence with the source?

A: no, unless all optimisation is disabled. The compiler may emit some group of instructions (or instruction-like things) per line initially, but the optimiser then reorders, splits, fuses and generally changes them completely.


If I'm disassembling release code, I look at the instructions which should have a clear logical relationship to the code. Eg,

.LBB7924:
        .loc 2 4880 0
        testq   %rdx, %rdx
        je      .L2680

looks like a branch if %rdx is zero, and it comes from line 4880. Find the line, identify the variable being tested, make a note that it's currently assigned to %rdx.

.LVL2123:
        testl   %ecx, %ecx
        jle     .L2680

OK, so this test and branch has the same target, so whatever comes next knows %rdx and %ecx are both nonzero. The original code might be structured like:

if (a && b) {

or perhaps it was:

if (!a || !b) {

and the optimiser reordered the two branches ...

Now you've got some structure you can hopefully match to the original code, you can also figure out the register assignments. Eg, if you know the thing being tested is the data member of some structure, read backwards to see where %rdx was loaded from memory: was it loaded from a fixed offset to some other register? If so, that register is probably the object address.

Good luck!

like image 139
Useless Avatar answered Oct 01 '22 21:10

Useless


The .loc directive is what you're looking for. These indicate line #4863, 4880, etc. There is no perfect mapping between source and optimized assembler (which is why you see 4880 more than once). But .loc is how you know where it is in the file. The syntax is:

.loc <file> <line> <column>
like image 20
Rob Napier Avatar answered Oct 01 '22 20:10

Rob Napier


Unless you statically link against system libraries, even without debug symbols there will be symbolic names in the binary - that of the system library functions linked to.

These can often help narrow down where you are in the code. For example, if you see that in function foo() it calls open() and then ioctl() and then it crashes right before calling read(), you can probably find that point in the source of foo quite easily. (For that matter you might not even need the dump - on linux you can get the record of crash occurrence relative to library and system functions using ltrace or strace)

Note that in some binary formats though, there may be an indirection to library functions via tiny wrappers elsewhere in the binary. Often a dump will still have relevant symbolic name information at the address of the invocation in the program flow. But even if not, you can recognize those external linkage wrappers by their range of address in the binary, and when you see one you can go find its code and figure out what external function it links to.

But as others have mentioned, if you have the source code and the system where it crashes frequently enough to be helpful, your fastest bet would usually be to rebuild with debug symbols, or insert logging output and get a more useful crash record.

like image 21
Chris Stratton Avatar answered Oct 01 '22 20:10

Chris Stratton