Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to disassemble the main function of a stripped application?

Let's say I compiled the application below and stripped it's symbols.

#include <stdio.h>  int main() {     printf("Hello\n"); } 

Build procedure:

gcc -o hello hello.c strip --strip-unneeded hello 

If the application wasn't stripped, disassembling the main function would be easy. However, I have no idea how to disassemble the main function of a stripped application.

(gdb) disas main No symbol table is loaded.  Use the "file" command.  (gdb) info line main Function "main" not defined. 

How could I do it? Is it even possible?

Notes: this must be done with GDB only. Forget objdump. Assume that I don't have access to the code.

A step-by-step example would be greatly appreciated.

like image 382
karlphillip Avatar asked Mar 29 '11 16:03

karlphillip


People also ask

What is the function of disassembling?

When referring to hardware, to disassemble is to break down a device into separate parts. A device may be disassembled to help determine a problem, to replace a part, or take the parts and use them in another device or sell them individually.

What is a stripped executable?

strip removes symbolic information (and other information not required for execution) from an executable file with a view to conserving disk space for production (that is, already debugged) programs. The Windows executable headers indicate how much data should be in an executable file.

What is a stripped file?

@Droider:- strip is something which can be run on an object file which is already compiled. It also has a variety of command-line options which you can use to configure which information will be removed. For example, -g strips only the debug information which gcc -g adds.


2 Answers

Ok, here a big edition of my previous answer. I think I found a way now.

You (still :) have this specific problem:

(gdb) disas main No symbol table is loaded.  Use the "file" command. 

Now, if you compile the code (I added a return 0 at the end), you will get with gcc -S:

    pushq   %rbp     movq    %rsp, %rbp     movl    $.LC0, %edi     call    puts     movl    $0, %eax     leave     ret 

Now, you can see that your binary gives you some info:

Striped:

(gdb) info files Symbols from "/home/beco/Documents/fontes/cpp/teste/stackoverflow/distrip". Local exec file:     `/home/beco/Documents/fontes/cpp/teste/stackoverflow/distrip', file type elf64-x86-64.     Entry point: 0x400440     0x0000000000400238 - 0x0000000000400254 is .interp     ...     0x00000000004003a8 - 0x00000000004003c0 is .rela.dyn     0x00000000004003c0 - 0x00000000004003f0 is .rela.plt     0x00000000004003f0 - 0x0000000000400408 is .init     0x0000000000400408 - 0x0000000000400438 is .plt     0x0000000000400440 - 0x0000000000400618 is .text     ...     0x0000000000601010 - 0x0000000000601020 is .data     0x0000000000601020 - 0x0000000000601030 is .bss 

The most important entry here is .text. It is a common name for a assembly start of code, and from our explanation of main bellow, from its size, you can see that it includes main. If you disassembly it, you will see a call to __libc_start_main. Most important, you are disassembling a good entry point that is real code (you are not misleading to change DATA to CODE).

disas 0x0000000000400440,0x0000000000400618 Dump of assembler code from 0x400440 to 0x400618:    0x0000000000400440:  xor    %ebp,%ebp    0x0000000000400442:  mov    %rdx,%r9    0x0000000000400445:  pop    %rsi    0x0000000000400446:  mov    %rsp,%rdx    0x0000000000400449:  and    $0xfffffffffffffff0,%rsp    0x000000000040044d:  push   %rax    0x000000000040044e:  push   %rsp    0x000000000040044f:  mov    $0x400540,%r8    0x0000000000400456:  mov    $0x400550,%rcx    0x000000000040045d:  mov    $0x400524,%rdi    0x0000000000400464:  callq  0x400428 <__libc_start_main@plt>    0x0000000000400469:  hlt    ...     0x000000000040046c:  sub    $0x8,%rsp    ...    0x0000000000400482:  retq       0x0000000000400483:  nop    ...    0x0000000000400490:  push   %rbp    ..    0x00000000004004f2:  leaveq     0x00000000004004f3:  retq       0x00000000004004f4:  data32 data32 nopw %cs:0x0(%rax,%rax,1)    ...    0x000000000040051d:  leaveq     0x000000000040051e:  jmpq   *%rax    ...    0x0000000000400520:  leaveq     0x0000000000400521:  retq       0x0000000000400522:  nop    0x0000000000400523:  nop    0x0000000000400524:  push   %rbp    0x0000000000400525:  mov    %rsp,%rbp    0x0000000000400528:  mov    $0x40062c,%edi    0x000000000040052d:  callq  0x400418 <puts@plt>    0x0000000000400532:  mov    $0x0,%eax    0x0000000000400537:  leaveq     0x0000000000400538:  retq    

The call to __libc_start_main gets as its first argument a pointer to main(). So, the last argument in the stack just immediately before the call is your main() address.

   0x000000000040045d:  mov    $0x400524,%rdi    0x0000000000400464:  callq  0x400428 <__libc_start_main@plt> 

Here it is 0x400524 (as we already know). Now you set a breakpoint an try this:

(gdb) break *0x400524 Breakpoint 1 at 0x400524 (gdb) run Starting program: /home/beco/Documents/fontes/cpp/teste/stackoverflow/disassembly/d2   Breakpoint 1, 0x0000000000400524 in main () (gdb) n Single stepping until exit from function main,  which has no line number information. hello 1 __libc_start_main (main=<value optimized out>, argc=<value optimized out>, ubp_av=<value optimized out>,      init=<value optimized out>, fini=<value optimized out>, rtld_fini=<value optimized out>,      stack_end=0x7fffffffdc38) at libc-start.c:258 258 libc-start.c: No such file or directory.     in libc-start.c (gdb) n  Program exited normally. (gdb)  

Now you can disassembly it using:

(gdb) disas 0x0000000000400524,0x0000000000400600 Dump of assembler code from 0x400524 to 0x400600:    0x0000000000400524:  push   %rbp    0x0000000000400525:  mov    %rsp,%rbp    0x0000000000400528:  sub    $0x10,%rsp    0x000000000040052c:  movl   $0x1,-0x4(%rbp)    0x0000000000400533:  mov    $0x40064c,%eax    0x0000000000400538:  mov    -0x4(%rbp),%edx    0x000000000040053b:  mov    %edx,%esi    0x000000000040053d:  mov    %rax,%rdi    0x0000000000400540:  mov    $0x0,%eax    0x0000000000400545:  callq  0x400418 <printf@plt>    0x000000000040054a:  mov    $0x0,%eax    0x000000000040054f:  leaveq     0x0000000000400550:  retq       0x0000000000400551:  nop    0x0000000000400552:  nop    0x0000000000400553:  nop    0x0000000000400554:  nop    0x0000000000400555:  nop    ... 

This is primarily the solution.

BTW, this is a different code, to see if it works. That is why the assembly above is a bit different. The code above is from this c file:

#include <stdio.h>  int main(void) {     int i=1;     printf("hello %d\n", i);     return 0; } 

But!


if this does not work, then you still have some hints:

You should be looking to set breakpoints in the beginning of all functions from now on. They are just before a ret or leave. The first entry point is .text itself. This is the assembly start, but not the main.

The problem is that not always a breakpoint will let your program run. Like this one in the very .text:

(gdb) break *0x0000000000400440 Breakpoint 2 at 0x400440 (gdb) run Starting program: /home/beco/Documents/fontes/cpp/teste/stackoverflow/disassembly/d2   Breakpoint 2, 0x0000000000400440 in _start () (gdb) n Single stepping until exit from function _start,  which has no line number information. 0x0000000000400428 in __libc_start_main@plt () (gdb) n Single stepping until exit from function __libc_start_main@plt,  which has no line number information. 0x0000000000400408 in ?? () (gdb) n Cannot find bounds of current function 

So you need to keep trying until you find your way, setting breakpoints at:

0x400440 0x40046c 0x400490 0x4004f4 0x40051e 0x400524 

From the other answer, we should keep this info:

In the non-striped version of the file, we see:

(gdb) disas main Dump of assembler code for function main:    0x0000000000400524 <+0>: push   %rbp    0x0000000000400525 <+1>: mov    %rsp,%rbp    0x0000000000400528 <+4>: mov    $0x40062c,%edi    0x000000000040052d <+9>: callq  0x400418 <puts@plt>    0x0000000000400532 <+14>:    mov    $0x0,%eax    0x0000000000400537 <+19>:    leaveq     0x0000000000400538 <+20>:    retq    End of assembler dump. 

Now we know that main is at 0x0000000000400524,0x0000000000400539. If we use the same offset to look at the striped binary we get the same results:

(gdb) disas 0x0000000000400524,0x0000000000400539 Dump of assembler code from 0x400524 to 0x400539:    0x0000000000400524:  push   %rbp    0x0000000000400525:  mov    %rsp,%rbp    0x0000000000400528:  mov    $0x40062c,%edi    0x000000000040052d:  callq  0x400418 <puts@plt>    0x0000000000400532:  mov    $0x0,%eax    0x0000000000400537:  leaveq     0x0000000000400538:  retq    End of assembler dump. 

So, unless you can get some tip where the main starts (like using another code with symbols), another way is if you can have some info about the firsts assembly instructions, so you can disassembly at specifics places and look if it matches. If you have no access at all to the code, you still can read the ELF definition to understand how many sections should appear in the code and try a calculated address. Still, you need info about sections in the code!

That is hard work, my friend! Good luck!

Beco

like image 171
DrBeco Avatar answered Sep 21 '22 00:09

DrBeco


How about doing info files to get the section list (with addresses), and going from there?

Example:

gdb) info files  Symbols from "/home/bob/tmp/t". Local exec file: `/home/bob/tmp/t', file type elf64-x86-64. Entry point: 0x400490 0x0000000000400270 - 0x000000000040028c is .interp 0x000000000040028c - 0x00000000004002ac is .note.ABI-tag     ....  0x0000000000400448 - 0x0000000000400460 is .init     .... 

The disassemble .init:

(gdb) disas 0x0000000000400448,0x0000000000400460 Dump of assembler code from 0x400448 to 0x400460:    0x0000000000400448:  sub    $0x8,%rsp    0x000000000040044c:  callq  0x4004bc    0x0000000000400451:  callq  0x400550    0x0000000000400456:  callq  0x400650    0x000000000040045b:  add    $0x8,%rsp    0x000000000040045f:  retq    

Then go ahead and disassemble the rest.

If I were you, and I had the same GCC version as your executable was built with, I'd examine the sequence of functions called on a dummy non-stripped executable. The sequence of calls is probably similar in most usual cases, so that might help you grind through the startup sequence up to your main by comparison. Optimizations will probably come in the way though.

If your binary is stripped and optimized, main might not exist as an "entity" in the binary; chances are you can't get much better than this type of procedure.

like image 30
Mat Avatar answered Sep 25 '22 00:09

Mat