Why are global variables in x86-64 accessed relative to the instruction pointer?

Tags:

I have tried to compile c code to assembly code using gcc -S -fasm foo.c. The c code declare global variable and variable in the main function as shown below:

int y=6;
int main()
{
        int x=4;
        x=x+y;
        return 0;
}

now I looked in the assembly code that has been generated from this C code and I saw, that the global variable y is stored using the value of the rip instruction pointer.

I thought that only const global variable stored in the text segment but, looking at this example it seems that also regular global variables are stored in the text segment which is very weird.

I guess that some assumption i made is wrong, so can someone please explain it to me?

the assembly code generated by c compiler:

        .file   "foo.c"
        .text
        .globl  y
        .data
        .align 4
        .type   y, @object
        .size   y, 4
y:
        .long   6
        .text
        .globl  main
        .type   main, @function

main:
.LFB0:
        .cfi_startproc
        pushq   %rbp
        .cfi_def_cfa_offset 16
        .cfi_offset 6, -16
        movq    %rsp, %rbp
        .cfi_def_cfa_register 6
        movl    $4, -4(%rbp)
        movl    y(%rip), %eax
        addl    %eax, -4(%rbp)
        movl    $0, %eax
        popq    %rbp
        .cfi_def_cfa 7, 8
        ret
        .cfi_endproc
.LFE0:

659

asked May 22 '19 18:05

roy cabouly

2 Answers

The offsets between different sections of your executable are link-time constants, so RIP-relative addressing is usable for any section (including .data where your non-const globals are). Note the .data in your asm output.

This applies even in a PIE executable or shared library, where the absolute addresses are not known until runtime (ASLR).

Runtime ASLR for position-independent executables (PIE) randomizes one base address for the entire program, not individual segment start addresses relative to each other.

All access to static variables uses RIP-relative addressing because that's most efficient, even in a position-dependent executable where absolute addressing is an option (because absolute addresses of static code/data are link-time constants, not relocated by dynamic linking in that case.)

Related and maybe duplicates:

Why is the address of static variables relative to the Instruction Pointer?
Why does this MOVSS instruction use RIP-relative addressing?

In 32-bit x86, there are 2 redundant ways to encode an addressing mode with no registers and a disp32 absolute address. (With and without a SIB byte). x86-64 repurposed the shorter one as RIP+rel32, so mov foo, %eax is 1 byte longer than mov foo(%rip), %eax.

64-bit absolute addressing would take even more space, and is only available for mov to/from RAX/EAX/AX/AL unless you use a separate instruction to get the address into a register first.

(In x86-64 Linux PIE/PIC, 64-bit absolute addressing is allowed, and handled via load-time fixups to put the right address into the code or jump table or statically-initialized function pointer. So code doesn't technically have to be position-independent, but normally it's more efficient to be. And 32-bit absolute addressing isn't allowed, because ASLR isn't limited to the low 31 bits of virtual address space.)

Note that in a non-PIE Linux executable, gcc will use 32-bit absolute addressing for putting the address of static data in a register. e.g. puts("hello"); will typically compile as

mov   $.LC0, %edi     # mov r32, imm32
call  puts

In the default non-PIE memory model, static code and data get linked into the low 32 bits of virtual address space, so 32-bit absolute addresses work whether they're zero- or sign-extended to 64-bit. This is handy for indexing static arrays, too, like mov array(%rax), %edx ; add $4, %eax for example.

See 32-bit absolute addresses no longer allowed in x86-64 Linux? for more about PIE executables, which use position-independent code for everything, including RIP-relative LEA like 7-byte lea .LC0(%rip), %rdi instead of 5-byte mov $.LC0, %edi. See How to load address of function or label into register

I mention Linux because it looks from the .cfi directives like you're compiling for a non-Windows platform.

190

answered Oct 17 '22 02:10

Peter Cordes

Although the .data and .text segments are independent of one another, once linked, their offsets relative to one another are fixed (at least in the gcc x86-64 -mcmodel=small code model, which is the default code model and works for all programs whose code+data is less than 2GB).

So wherever the system loads an executable in the process's address space, the instructions and the data they reference will have fixed offsets relative to one another.

For these reasons, x86-64 programs compiled for the (default) small code model use RIP-relative addressing for both code and global data. Doing so means the compiler doesn't need to dedicate a register to point to wherever the system loaded the executable's .data section; the program already knows its own RIP value and the offset between that and the global data it wants to access, so the most efficient way of accessing it is via a 32-bit fixed offset from RIP.

(Absolute 32-bit addressing modes would take more space, and 64-bit absolute addressing modes are even less efficient, and only available for RAX/EAX/AX/AL.)

You can find more info about this on Eli Bendersky's website: Understanding the x64 code models

answered Oct 17 '22 02:10

phonetagger

Related questions
                            
                                A 'C' based web application framework like Tornado or Twisted?
                            
                                Debugging principles/core topics in C/C++ [closed]
                            
                                How to check if empty array in C
                            
                                copy specific characters from a string to another string
                            
                                How to get the raw command line arguments
                            
                                A good random number generator for C
                            
                                Get the dimensions of a HDF5 dataset
                            
                                C: Linux command executed by popen() function not showing results
                            
                                Would compiler optimize conditional statement in loop by moving it ouside the loop?
                            
                                Incrementing Pointers, Exact Sequence
                            
                                Relation between Thread ID and Process ID
                            
                                Use cases of the GCC "artificial" function attribute
                            
                                C++ code and C version macros
                            
                                Does sprintf/snprintf allocate additional memory?
                            
                                How is `int main(int argc, char* argv<::>)` a valid signature of main? [duplicate]
                            
                                What is the significance of forward declaration in C programming?
                            
                                How to know whether a pointer is in physical memory or it will trigger a Page Fault?
                            
                                ARM Linux executable mysteriously runs on x86_64
                            
                                How is it possible to read the CPU registers using a debugger running on the same CPU?
                            
                                EVP_MD_CTX "error: storage size of ‘ctx’ isn’t known"

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why are global variables in x86-64 accessed relative to the instruction pointer?

Tags:

c

assembly

x86-64

compiler-construction

roy cabouly

People also ask

2 Answers

Peter Cordes

phonetagger

Recent Activity

Donate For Us