Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Understanding the purpose of some assembly statements

I am trying to understand some assembly code and managed to finish most of it except a few lines. I am able to understand most of what is happening inside but am not able to fully understand what (and why it) is happening at the beginning and ending of the code. Can someone shed some light on this?

int main() {
    int a, b;
    a = 12;
    b = 20;
    b = a + 123;
    return 0;
}

Disassembled Version:

 8048394:8d 4c 24 04          lea    0x4(%esp),%ecx              ; ??
 8048398:83 e4 f0             and    $0xfffffff0,%esp            ; ??
 804839b:ff 71 fc             pushl  -0x4(%ecx)                  ; ??
 804839e:55                   push   %ebp                        ; Store the Base pointer
 804839f:89 e5                mov    %esp,%ebp                   ; Initialize the Base pointer with the stack pointer
 80483a1:51                   push   %ecx                        ; ??
 80483a2:83 ec 4c             sub    $0x4c,%esp                  ; ??
 80483a5:c7 45 f8 0c 00 00 00 movl   $0xc,-0x8(%ebp)             ; Move 12 into -0x8(%ebp)
 80483ac:c7 45 f4 14 00 00 00 movl   $0x14,-0xc(%ebp)            ; Move 20 into -0xc(%ebp)
 80483b3:8b 45 f8             mov    -0x8(%ebp),%eax             ; Move 12@-0x8(%ebp) into eax
 80483b6:83 c0 7b             add    $0x7b,%eax                  ; Add 123 to 12@eax
 80483b9:89 45 f4             mov    %eax,-0xc(%ebp)             ; Store the result into b@-0xc(%ebp)
 80483bc:b8 00 00 00 00       mov    $0x0,%eax                   ; Move 0 into eax
 80483c1:83 c4 10             add    $0x10,%esp                  ; ??
 80483c4:59                   pop    %ecx                        ; ??
 80483c5:5d                   pop    %ebp                        ; ??
 80483c6:8d 61 fc             lea    -0x4(%ecx),%esp             ; ??
like image 241
Legend Avatar asked Nov 19 '10 18:11

Legend


2 Answers

The stack grows downward. A push subtracts from the stack pointer (esp) and a pop adds to esp. You have to keep that in mind to understand a lot of this.

8048394:8d 4c 24 04          lea    0x4(%esp),%ecx              ; ??

lea = Load Effective Address

This saves the address of the thing that lies 4 bytes into the stack. Since this is 32 bit (4 byte word) x86 code that means the second item on the stack. Since this is the code of a function (main in this case) the 4 bytes that are at the top of the stack is the return address.

8048398:83 e4 f0             and    $0xfffffff0,%esp            ; ??

This code makes sure that the stack is aligned to 16 bytes. After this operation esp will be less than or equal to what it was before this operation, so the stack may grow, which protects anything that might already be on the stack. This is sometimes done in main just in case the function is called with an unaligned stack, which can cause things to be really slow (16 byte is a cache line width on x86, I think, though 4 byte alignment is what is really important here). If main has a unaligned stack the rest of the program will too.

 804839b:ff 71 fc             pushl  -0x4(%ecx)                  ; ??

Since ecx was loaded before as a pointer to the thing on the other side of the return address from the previous top of the stack, so since this has a -4 index this refers to back to the return address for the current function being pushed back to the top of the stack so that main can return normally. (Push is magic and seems to be able to both load and store from to different places in RAM in the same instruction).

 804839e:55                   push   %ebp                        ; Store the Base pointer
 804839f:89 e5                mov    %esp,%ebp                   ; Initialize the Base pointer with the stack pointer
 80483a1:51                   push   %ecx                        ; ??
 80483a2:83 ec 4c             sub    $0x4c,%esp                  ; ??

This is mostly the standard function prologue (the previous stuff was special for main). This is making a stack frame (area between ebp and esp) where local variables can live. ebp is pushed so that the old stack frame can be restored in the epilogue (at the end of the current function).

80483a5:c7 45 f8 0c 00 00 00 movl   $0xc,-0x8(%ebp)             ; Move 12 into -0x8(%ebp)
80483ac:c7 45 f4 14 00 00 00 movl   $0x14,-0xc(%ebp)            ; Move 20 into -0xc(%ebp)
80483b3:8b 45 f8             mov    -0x8(%ebp),%eax             ; Move 12@-0x8(%ebp) into eax
80483b6:83 c0 7b             add    $0x7b,%eax                  ; Add 123 to 12@eax
80483b9:89 45 f4             mov    %eax,-0xc(%ebp)             ; Store the result into b@-0xc(%ebp)

80483bc:b8 00 00 00 00       mov    $0x0,%eax                   ; Move 0 into eax

eax is where integer function return values are stored. This is setting up to return 0 from main.

80483c1:83 c4 10             add    $0x10,%esp                  ; ??
80483c4:59                   pop    %ecx                        ; ??
80483c5:5d                   pop    %ebp                        ; ??
80483c6:8d 61 fc             lea    -0x4(%ecx),%esp             ; ??

This is the function epilogue. It is more difficult to understand because of the weird stack alignment code at the beginning. I'm having a little bit of trouble figuring out why the stack is being adjusted by a lower amount this time than in the prologue, though.

It if obvious that this particular code was not compiled with optimizations on. If it were there probably wouldn't be much there since the compiler can see that even if it did not do the math listed in your main the end result of the program is the same. With programs that do actually do something (have side effects or results) it sometimes easier to read lightly optimized code (-O1 or -0s arguments to gcc).

Reading assembly generated by a compiler is often much easier for functions that aren't main. If you want to read to understand the code then write yourself a function that takes some arguments to produce a result or that works on global variables, and you will be able to understand it better.

Another thing that will probably help you is to just have gcc generate the assembly files for you, rather than disassembling them. The -S flag tells it to generate this (but not to generate other files), and names the assembly files with a .s on the end. This should be easier for you to read than the disassembled versions.

like image 60
nategoose Avatar answered Oct 06 '22 00:10

nategoose


Not sure why the compiler does all this stuff, but here's what I can decipher:

 8048394:8d 4c 24 04          lea    0x4(%esp),%ecx              ; ecx := esp+4
 8048398:83 e4 f0             and    $0xfffffff0,%esp            ; align the stack to 16 bytes
 804839b:ff 71 fc             pushl  -0x4(%ecx)                  ; push [ecx-4] ([esp])
 80483a1:51                   push   %ecx                        ; push ecx
 80483a2:83 ec 4c             sub    $0x4c,%esp                  ; allocate 19 dwords on stack
 80483c1:83 c4 10             add    $0x10,%esp                  ; deallocate 4 dwords from stack
 80483c4:59                   pop    %ecx                        ; restore ecx
 80483c5:5d                   pop    %ebp                        ; and ebp
 80483c6:8d 61 fc             lea    -0x4(%ecx),%esp             ; esp := [ecx-4]
like image 36
Jens Björnhager Avatar answered Oct 05 '22 23:10

Jens Björnhager