Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Assembler debug of undefined expression

I'm trying to get a better understanding of how compilers produce code for undefined expressions e.g. for the following code:

int main()
{
    int i = 5;
    i = i++;
    return 0;
}

This is the assembler code generated by gcc 4.8.2 (Optimisation is off -O0 and I’ve inserted my own line numbers for reference purposes):

(gdb) disassemble main
Dump of assembler code for function main:
(1) 0x0000000000000000 <+0>:    push   %rbp
(2) 0x0000000000000001 <+1>:    mov    %rsp,%rbp
(3) 0x0000000000000004 <+4>:    movl   $0x5,-0x4(%rbp)
(4) 0x000000000000000b <+11>:   mov    -0x4(%rbp),%eax
(5) 0x000000000000000e <+14>:   lea    0x1(%rax),%edx
(6) 0x0000000000000011 <+17>:   mov    %edx,-0x4(%rbp)
(7) 0x0000000000000014 <+20>:   mov    %eax,-0x4(%rbp)
(8) 0x0000000000000017 <+23>:   mov    $0x0,%eax
(9) 0x000000000000001c <+28>:   pop    %rbp
(10) 0x000000000000001d <+29>:  retq   
End of assembler dump.

Execution of this code results in the value of i remaining at the value of 5 (verified with a printf() statement) i.e. i doesn't appear to ever be incremented. I understand that different compilers will evaluate/compile undefined expressions in differnet ways and this may just be the way that gcc does it i.e. I could get a different result with a different compiler.

With respect to the assembler code, as I understand:

Ignoring line - 1-2 setting up of stack/base pointers etc. line 3/4 - is how the value of 5 is assigned to i.

Can anyone explain what is happening on line 5-6? It looks as if i will be ultimately reassigned the value of 5 (line 7), but is the increment operation (required for the post increment operation i++) simply abandoned/skipped by the compiler in the case?

like image 956
user3742467 Avatar asked May 30 '15 12:05

user3742467


2 Answers

These three lines contain your answer:

lea    0x1(%rax),%edx
mov    %edx,-0x4(%rbp)
mov    %eax,-0x4(%rbp)

The increment operation isn't skipped. lea is the increment, taking the value from %rax and storing the incremented value in %edx. %edx is stored but then overwritten by the next line which uses the original value from %eax.

They key to understanding this code is to know how lea works. It stands for load effective address, so while it looks like a pointer dereference, it actually just does the math needed to get the final address of [whatever], and then keeps the address, instead of the value at that address. This means it can be used for any mathematical expression that can be expressed efficiently using addressing modes, as an alternative to mathematical opcodes. It's frequently used as a way to get a multiply and add into a single instruction for this reason. In particular, in this case it's used to increment the value and move the result to a different register in one instruction, where inc would instead overwrite it in-place.

like image 63
Leushenko Avatar answered Nov 04 '22 12:11

Leushenko


Line 5-6, is the i++. The lea 0x1(%rax),%edx is i + 1 and mov %edx,-0x4(%rbp) writes that back to i. However line 7, the mov %eax,-0x4(%rbp) writes the original value back into i. The code looks like:

(4) eax = i
(5) edx = i + 1
(6) i = edx
(7) i = eax
like image 37
Jester Avatar answered Nov 04 '22 12:11

Jester