Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

X86 Assembly Instruction Pointer Addressing

Tags:

c

x86

assembly

I normally don't spend much time reading assembly, so the following compiler output confused me a little.

Say I compile this piece of C code on my Intel Core 2 Duo running OSX 10.6:

while (var != 69) // var is a global variable
{
    printf("Looping!\n");
}

The assembly for the "var != 69" comparison looks like:

cmpl    $69, _var(%rip)

I understand that it effectively means to compare the value "69" against the contents of the global variable "var", but I'm having a tough time understanding the "_var(%rip)" part. Normally, I expect there to be a offset value, like for referring to local variables in the stack (eg: -4($ebp)). However, I don't quite following how offsetting the instruction pointer with the "_var" declaration will give me the contents of the global variable "var".

What exactly does that line mean?

Thanks.

like image 627
lhumongous Avatar asked Jun 27 '11 03:06

lhumongous


1 Answers

This works very nearly the same as addressing local variables in the stack with offset(%ebp). In this case, the linker will set the offset field of that instruction to the difference between the address of var, and the value that %rip will have when that instruction executes. (If I remember correctly, that value is the address of the next instruction, because %rip always points to the instruction after the one currently executing.) The addition thus gives the address of var.

Why do it this way? This is a hallmark of position-independent code. If the compiler had generated

cmpl $69, _var

and the linker had filled in the absolute address of var, then when you ran the program, the executable image would always have to be loaded into memory at one specific address, so that all the variables had the absolute addresses that the code expects. By doing it this way, the only thing that has to be fixed is the distance between the code and the data; the code plus data (i.e. the complete executable image) can be loaded at any address and it'll still work.

... Why bother? Why is it bad to have to load an executable at one specific address? It isn't, necessarily. Shared libraries have to be position-independent, because otherwise you might have two libraries that wanted to be loaded at overlapping addresses and you couldn't use both of them in the same program. (Some systems have dealt with this by keeping a global registry of all libraries and the space they require, but obviously this does not scale.) Making executables position-independent is largely done as a security measure: it's somewhat harder to exploit a buffer overflow if you don't know where the program's code is in memory (this is called address space layout randomization).

like image 191
zwol Avatar answered Nov 12 '22 20:11

zwol