I normally don't spend much time reading assembly, so the following compiler output confused me a little.
Say I compile this piece of C code on my Intel Core 2 Duo running OSX 10.6:
while (var != 69) // var is a global variable
{
printf("Looping!\n");
}
The assembly for the "var != 69" comparison looks like:
cmpl $69, _var(%rip)
I understand that it effectively means to compare the value "69" against the contents of the global variable "var", but I'm having a tough time understanding the "_var(%rip)" part. Normally, I expect there to be a offset value, like for referring to local variables in the stack (eg: -4($ebp)). However, I don't quite following how offsetting the instruction pointer with the "_var" declaration will give me the contents of the global variable "var".
What exactly does that line mean?
Thanks.
This works very nearly the same as addressing local variables in the stack with offset(%ebp)
. In this case, the linker will set the offset field of that instruction to the difference between the address of var
, and the value that %rip
will have when that instruction executes. (If I remember correctly, that value is the address of the next instruction, because %rip
always points to the instruction after the one currently executing.) The addition thus gives the address of var
.
Why do it this way? This is a hallmark of position-independent code. If the compiler had generated
cmpl $69, _var
and the linker had filled in the absolute address of var
, then when you ran the program, the executable image would always have to be loaded into memory at one specific address, so that all the variables had the absolute addresses that the code expects. By doing it this way, the only thing that has to be fixed is the distance between the code and the data; the code plus data (i.e. the complete executable image) can be loaded at any address and it'll still work.
... Why bother? Why is it bad to have to load an executable at one specific address? It isn't, necessarily. Shared libraries have to be position-independent, because otherwise you might have two libraries that wanted to be loaded at overlapping addresses and you couldn't use both of them in the same program. (Some systems have dealt with this by keeping a global registry of all libraries and the space they require, but obviously this does not scale.) Making executables position-independent is largely done as a security measure: it's somewhat harder to exploit a buffer overflow if you don't know where the program's code is in memory (this is called address space layout randomization).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With