Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Disassemble core file backwards with gdb

during a fuzzing session, the tested application crashed and I got a core file. After opening the core file with gdb, it says that the application has no debugging symbols.

The backtrace command reveals the following :

 Program terminated with signal 11, Segmentation fault.
 #0  0x0000000000000000 in ?? () 
  (gdb) bt
 #0  0x0000000000000000 in ?? ()
 #1  0x00007f8749f065d3 in ?? ()
 #2  0x00007f874c6a9000 in ?? ()
 #3  0x00007f8749f06568 in ?? ()
 #4  0x00007f874c6ab9a0 in ?? ()
 #5  0x00007f874c6aaa00 in ?? ()
 #6  0x0000000000000001 in ?? ()
 #7  0x00007f8749f06664 in ?? ()
 #8  0x0000000000000000 in ?? ()

So, when I type (gdb) disass 0x00007f8749f065d3, +1 then I get the following output:

Dump of assembler code from 0x7f8749f065d3 to 0x7f8749f065d4:
   0x00007f8749f065d3:  mov    QWORD PTR [rbx+0x70],0x0
End of assembler dump.

And now my question:

For example, when I am at the previous line (0x00007f8749f065d3) and want to analyze, let say, the two executed lines before 0x00007f8749f065d3 , then what must I type ?

Note: A command like (gdb) disass 0x00007f8749f065d3, -2 which I type intuitively did not help.

Best regards,

like image 656
user3097712 Avatar asked Mar 12 '23 07:03

user3097712


1 Answers

Do you need to understand how to associate source lines with assembler instructions? Or do you mean how to disassemble instruction before the current instruction?

Here are answers to both questions.

The hex numbers shown in a backtrace command are the program counter address for the entry point to the function. The assembler command at the address will be a single instruction like a call or jmp.

A compiler, like gcc, will turn source code into assembler instructions. The compiler will include 'DWARF' information in the executable image. The DWARF information will be used to associate a set of assembler instructions to a particular line of source code.

Consider this C fragment:

int main(int argc, char *argv[])
   {
       int s, tnum, opt, num_threads;
       struct thread_info *tinfo;
       pthread_attr_t attr;
       int stack_size;
       void *res;

       /* The "-s" option specifies a stack size for our threads */

   stack_size = -1;


  while ((opt = getopt(argc, argv, "s:")) != -1) {
       switch (opt) {
       case 's':
           stack_size = strtoul(optarg, NULL, 0);
           break;

Using disass /m demonstrates how gdb associates the source lines with the assembler.

48         {
   0x0000000000400bdf <+0>: push   %rbp
   0x0000000000400be0 <+1>: mov    %rsp,%rbp
   0x0000000000400be3 <+4>: add    $0xffffffffffffff80,%rsp
   0x0000000000400be7 <+8>: mov    %edi,-0x74(%rbp)
   0x0000000000400bea <+11>:    mov    %rsi,-0x80(%rbp)
   0x0000000000400bee <+15>:    mov    %fs:0x28,%rax
   0x0000000000400bf7 <+24>:    mov    %rax,-0x8(%rbp)
   0x0000000000400bfb <+28>:    xor    %eax,%eax

49             int s, tnum, opt, num_threads;
50             struct thread_info *tinfo;
51             pthread_attr_t attr;
52             int stack_size;
53             void *res;
54  
55             /* The "-s" option specifies a stack size for our threads */
56  
57             stack_size = -1;
   0x0000000000400bfd <+30>:    movl   $0xffffffff,-0x60(%rbp)

58             while ((opt = getopt(argc, argv, "s:")) != -1) {
   0x0000000000400c04 <+37>:    jmp    0x400c56 <main+119>
   0x0000000000400c56 <+119>:   mov    -0x80(%rbp),%rcx
   0x0000000000400c5a <+123>:   mov    -0x74(%rbp),%eax
   0x0000000000400c5d <+126>:   mov    $0x401002,%edx
   0x0000000000400c62 <+131>:   mov    %rcx,%rsi
   0x0000000000400c65 <+134>:   mov    %eax,%edi
   0x0000000000400c67 <+136>:   callq  0x400a00 <getopt@plt>
   0x0000000000400c6c <+141>:   mov    %eax,-0x5c(%rbp)
   0x0000000000400c6f <+144>:   cmpl   $0xffffffff,-0x5c(%rbp)
   0x0000000000400c73 <+148>:   jne    0x400c06 <main+39>

59                 switch (opt) {
   0x0000000000400c06 <+39>:    mov    -0x5c(%rbp),%eax
   0x0000000000400c09 <+42>:    cmp    $0x73,%eax
   0x0000000000400c0c <+45>:    jne    0x400c2c <main+77>

60                 case 's':
61                     stack_size = strtoul(optarg, NULL, 0);
   0x0000000000400c0e <+47>:    mov    0x2014cb(%rip),%rax        # 0x6020e0 <optarg@@GLIBC_2.2.5>
   0x0000000000400c15 <+54>:    mov    $0x0,%edx
   0x0000000000400c1a <+59>:    mov    $0x0,%esi
   0x0000000000400c1f <+64>:    mov    %rax,%rdi
   0x0000000000400c22 <+67>:    callq  0x400a10 <strtoul@plt>
   0x0000000000400c27 <+72>:    mov    %eax,-0x60(%rbp)

62                     break;
   0x0000000000400c2a <+75>:    jmp    0x400c56 <main+119>

Some of the source lines are a single instructions while others are composed of multiple instructions. Note that the address is out of order for some of the lines as well

As you can see, with no debugging information in the executable there is no automated way to associate an assembler instruction with a source line. The very purpose of the debugging information is to associate the two.

For variable length instruction architectures like x86, this is no easy way to look backwards for the previous instruction. The best strategy, in the debugger, is just to subtract some value, say 16, and do the disassembly. If current instruction appears correct then most of the disassembly is correct. From my example, lets consider 0x0000000000400c06.

So by subtracting 16 (0x10), we have 0x400bf6.

(gdb) disass 0x0000000000400bf6, +20
Dump of assembler code from 0x400bf6 to 0x400c0a:
   0x0000000000400bf6 <main+23>:    add    %cl,-0x77(%rax)
   0x0000000000400bf9 <main+26>:    rex.RB clc 
   0x0000000000400bfb <main+28>:    xor    %eax,%eax
   0x0000000000400bfd <main+30>:    movl   $0xffffffff,-0x60(%rbp)
   0x0000000000400c04 <main+37>:    jmp    0x400c56 <main+119>
   0x0000000000400c06 <main+39>:    mov    -0x5c(%rbp),%eax
   0x0000000000400c09 <main+42>:    cmp    $0x73,%eax

Comparing to the disassembly above, the disassembly is correct starting with the 'xor' instruction. If you get familiar with assembler then your will notice odd instructions like the 'rex'.

It would be reasonable to ask how gdb disassembles the code. The answer is that it always starts at the first instruction of the function and works its way downward.

Good question!

like image 89
Matthew Fisher Avatar answered Mar 20 '23 13:03

Matthew Fisher