Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Help with understanding a very basic main() disassembly in GDB

Heyo,

I have written this very basic main function to experiment with disassembly and also to see and hopefully understand what is going on at the lower level:

int main() {
  return 6;
}

Using gdb to disas main produces this:

0x08048374 <main+0>:    lea    0x4(%esp),%ecx
0x08048378 <main+4>:    and    $0xfffffff0,%esp
0x0804837b <main+7>:    pushl  -0x4(%ecx)
0x0804837e <main+10>:   push   %ebp
0x0804837f <main+11>:   mov    %esp,%ebp
0x08048381 <main+13>:   push   %ecx
0x08048382 <main+14>:   mov    $0x6,%eax
0x08048387 <main+19>:   pop    %ecx
0x08048388 <main+20>:   pop    %ebp
0x08048389 <main+21>:   lea    -0x4(%ecx),%esp
0x0804838c <main+24>:   ret  

Here is my best guess as to what I think is going on and what I need help with line-by-line:

lea 0x4(%esp),%ecx

Load the address of esp + 4 into ecx. Why do we add 4 to esp?

I read somewhere that this is the address of the command line arguments. But when I did x/d $ecx I get the value of argc. Where are the actual command line argument values stored?

and $0xfffffff0,%esp

Align stack

pushl -0x4(%ecx)

Push the address of where esp was originally onto the stack. What is the purpose of this?

push %ebp

Push the base pointer onto the stack

mov %esp,%ebp

Move the current stack pointer into the base pointer

push %ecx

Push the address of original esp + 4 on to stack. Why?

mov $0x6,%eax

I wanted to return 6 here so i'm guessing the return value is stored in eax?

pop %ecx

Restore ecx to value that is on the stack. Why would we want ecx to be esp + 4 when we return?

pop %ebp

Restore ebp to value that is on the stack

lea -0x4(%ecx),%esp

Restore esp to it's original value

ret

I am a n00b when it comes to assembly so any help would be great! Also if you see any false statements about what I think is going on please correct me.

Thanks a bunch! :]

like image 725
masterwok Avatar asked Jan 20 '11 19:01

masterwok


People also ask

How to see disassembly in gdb?

From within gdb press Ctrl x 2 and the screen will split into 3 parts. First part will show you the normal code in high level language. Second will show you the assembly equivalent and corresponding instruction Pointer .

What does disassemble do in gdb?

If AUTO, GDB will display disassembly of next instruction only if the source line cannot be displayed. This setting causes GDB to display some feedback when you step through a function with no line info or whose source file is unavailable.

Which command is used to disassemble code?

The objdump command is generally used to inspect the object files and binary files. It prints the different sections in object files, their virtual memory address, logical memory address, debug information, symbol table, and other pieces of information. Here we'll see how we can use this tool to disassemble the files.

How does disassembly work?

In programming terminology, to disassemble is to convert a program in its executable (ready-to-run) form (sometimes called object code ) into a representation in some form of assembler language so that it is readable by a human.


1 Answers

Stack frames

The code at the beginning of the function body:

push  %ebp
mov   %esp, %ebp

is to create the so-called stack frame, which is a "solid ground" for referencing parameters and objects local to the procedure. The %ebp register is used (as its name indicates) as a base pointer, which points to the base (or bottom) of the local stack inside the procedure.

After entering the procedure, the stack pointer register (%esp) points to the return address stored on the stack by the call instruction (it is the address of the instruction just after the call). If you'd just invoke ret now, this address would be popped from the stack into the %eip (instruction pointer) and the code would execute further from that address (of the next instruction after the call). But we don't return yet, do we? ;-)

You then push %ebp register to save its previous value somewhere and not lose it, because you'll use it for something shortly. (BTW, it usually contains the base pointer of the caller function, and when you peek that value, you'll find a previously stored %ebp, which would be again a base pointer of the function one level higher, so you can trace the call stack that way.) When you save the %ebp, you can then store the current %esp (stack pointer) there, so that %ebp will point to the same address: the base of the current local stack. The %esp will move back and forth inside the procedure when you'll be pushing and popping values on the stack or reserving & freeing local variables. But %ebp will stay fixed, still pointing to the base of the local stack frame.

Accessing parameters

Parameters passed to the procedure by the caller are "burried just uner the ground" (that is, they have positive offsets relative to the base, because stack grows down). You have in %ebp the address of the base of the local stack, where lies the previous value of the %ebp. Below it (that is, at 4(%ebp) lies the return address. So the first parameter will be at 8(%ebp), the second at 12(%ebp) and so on.

Local variables

And local variables could be allocated on the stack above the base (that is, they'd have negative offsets relative to the base). Just subtract N to the %esp and you've just allocated N bytes on the stack for local variables, by moving the top of the stack above (or, precisely, below) this region :-) You can refer to this area by negative offsets relative to %ebp, i.e. -4(%ebp) is the first word, -8(%ebp) is second etc. Remember that (%ebp) points to the base of the local stack, where the previous %ebp value has been saved. So remember to restore the stack to the previous position before you try to restore the %ebp through pop %ebp at the end of the procedure. You can do it two ways:
1. You can free only the local variables by adding back the N to the %esp (stack pointer), that is, moving the top of the stack as if these local variables had never been there. (Well, their values will stay on the stack, but they'll be considered "freed" and could be overwritten by subsequent pushes, so it's no longer safe to refer them. They're dead bodies ;-J )
2. You can flush the stack down to the ground and free all local space by simply restoring the %esp from the %ebp which has been fixed earlier to the base of the stack. It'll restore the stack pointer to the state it has just after entering the procedure and saving the %esp into %ebp. It's like loading the previously saved game when you've messed something ;-)

Turning off frame pointers

It's possible to have a less messy assembly from gcc -S by adding a switch -fomit-frame-pointer. It tells GCC to not assemble any code for setting/resetting the stack frame until it's really needed for something. Just remember that it can confuse debuggers, because they usually depend on the stack frame being there to be able to track up the call stack. But it won't break anything if you don't need to debug this binary. It's perfectly fine for release targets and it saves some spacetime.

Call Frame Information

Sometimes you can meet some strange assembler directives starting from .cfi interleaved with the function header. This is a so-called Call Frame Information. It's used by debuggers to track the function calls. But it's also used for exception handling in high-level languages, which needs stack unwinding and other call-stack-based manipulations. You can turn it off too in your assembly, by adding a switch -fno-dwarf2-cfi-asm. This tells the GCC to use plain old labels instead of those strange .cfi directives, and it adds a special data structures at the end of your assembly, refering to those labels. This doesn't turn off the CFI, just changes the format to more "transparent" one: the CFI tables are then visible to the programmer.

like image 152
SasQ Avatar answered Nov 07 '22 14:11

SasQ