Why use the Global Offset Table for symbols defined in the shared library itself?

Tags:

Consider the following simple shared library source code:

library.cpp:

static int global = 10;

int foo()
{
    return global;
}

Compiled with -fPIC option in clang, it results in this object assembly (x86-64):

foo(): # @foo()
  push rbp
  mov rbp, rsp
  mov eax, dword ptr [rip + global]
  pop rbp
  ret
global:
  .long 10 # 0xa

Since the symbol is defined inside the library, the compiler is using a PC relative addressing as expected: mov eax, dword ptr [rip + global]

However if we change static int global = 10; to int global = 10; making it a symbol with external linkage, the resulting assembly is:

foo(): # @foo()
  push rbp
  mov rbp, rsp
  mov rax, qword ptr [rip + global@GOTPCREL]
  mov eax, dword ptr [rax]
  pop rbp
  ret
global:
  .long 10 # 0xa

As you can see the compiler added a layer of indirection with the Global Offset Table, which seems totally unnecessary in this case as the symbol is still defined inside the same library (and source file).

If the symbol was defined in another shared library, the GOT would be necessary, but in this case it feels redundant. Why is the compiler still adding this symbol to the GOT?

Note: I believe this question is similiar to this, however the answer was not pertinent maybe due to a lack of details.

319

asked Apr 09 '19 07:04

yggdrasil

1 Answers

The Global Offset Table serves two purposes. One is to allow the dynamic linker "interpose" a different definition of the variable from the executable or other shared object. The second is to allow position independent code to be generated for references to variables on certain processor architectures.

ELF dynamic linking treats the entire process, the executable and all of the shared objects (dynamic libraries), as sharing one single global namespace. If multiple components (executable or shared objects) define the same global symbol then the dynamic linker normally chooses one definition of that symbol and all references to that symbol in all components refer to that one definition. (However, the ELF dynamic symbol resolution is complex and for various reasons different components can end up using different definitions of the the same global symbol.)

To implement this, when building a shared library the compiler will access global variables indirectly through the GOT. For each variable an entry in the GOT will be created containing a pointer to the variable. As your example code shows, the compiler will then use this entry to obtain the address of variable instead of trying to access it directly. When the shared object is loaded into a process the dynamic linker will determine whether any of the global variables have been superseded by variable definitions in another component. If so those global variables will have their GOT entries updated to point at the superseding variable.

By using the "hidden" or "protected" ELF visibility attributes it's possible to prevent global defined symbol from being superseded by a definition in another component, and thus removing the need to use the GOT on certain architectures. For example:

extern int global_visible;
extern int global_hidden __attribute__((visibility("hidden")));
static volatile int local;  // volatile, so it's not optimized away

int
foo() {
    return global_visible + global_hidden + local;
}

when compiled with -O3 -fPIC with the x86_64 port of GCC generates:

foo():
        mov     rcx, QWORD PTR global_visible@GOTPCREL[rip]
        mov     edx, DWORD PTR local[rip]
        mov     eax, DWORD PTR global_hidden[rip]
        add     eax, DWORD PTR [rcx]
        add     eax, edx
        ret

As you can see, only global_visible uses the GOT, global_hidden and local don't use it. The "protected" visibility works similarly, it prevents the definition from being superseded but makes it still visible to the dynamic linker so it can be accessed by other components. The "hidden" visibility hides the symbol completely from the dynamic linker.

The necessity of making code relocatable in order allow shared objects to be loaded a different addresses in different process means that statically allocated variables, whether they have global or local scope, can't be accessed with directly with a single instruction on most architectures. The only exception I know of is the 64-bit x86 architecture, as you see above. It supports memory operands that are both PC-relative and have large 32-bit displacements that can reach any variable defined in the same component.

On all the other architectures I'm familiar with accessing variables in position dependent manner requires multiple instructions. How exactly varies greatly by architecture, but it often involves using the GOT. For example, if you compile the example C code above with x86_64 port of GCC using the -m32 -O3 -fPIC options you get:

foo():
        call    __x86.get_pc_thunk.dx
        add     edx, OFFSET FLAT:_GLOBAL_OFFSET_TABLE_
        push    ebx
        mov     ebx, DWORD PTR global_visible@GOT[edx]
        mov     ecx, DWORD PTR local@GOTOFF[edx]
        mov     eax, DWORD PTR global_hidden@GOTOFF[edx]
        add     eax, DWORD PTR [ebx]
        pop     ebx
        add     eax, ecx
        ret
__x86.get_pc_thunk.dx:
        mov     edx, DWORD PTR [esp]
        ret

The GOT is used for all three variable accesses, but if you look closely global_hidden and local are handled differently than global_visible. With the later, a pointer to the variable is accessed through the GOT, with former two variables they're accessed directly through the GOT. This a fairly common trick among architectures where the GOT is used for all position independent variable references.

The 32-bit x86 architecture is exceptional in one way here, since it has large 32-bit displacements and a 32-bit address space. This means that anywhere in memory can be accessed through the GOT base, not just the GOT itself. Most other architectures only support much smaller displacements, which makes the maximum distance something can be from the GOT base much smaller. Other architectures that use this trick will only put small (local/hidden/protected) variables in the GOT itself, large variables are stored outside the GOT and the GOT will contain a pointer to the variable just like with normal visibility global variables.

answered Nov 15 '22 19:11

Ross Ridge

Related questions
                            
                                template template parameter of unknown type
                            
                                performance comparsion between vector and raw c-style array
                            
                                extern "C" Default argument works or not?
                            
                                In C++ can I pass a structure as a pointer without declaring it locally?
                            
                                Is the 16-bit math in this program invoking undefined behavior?
                            
                                Is there a way to avoid this warning from clang-tidy (fuchsia-default-arguments) while initializing a string?
                            
                                Why does emplace_back("Hello") call strlen?
                            
                                How to disable vectorization in clang++?
                            
                                Check if a type is std::basic_string<T> in compile time in C++
                            
                                Order-preserving memcpy in C++
                            
                                Deducing Multiple Parameter Packs
                            
                                Is there a way to make this shortest path algorithm faster?
                            
                                std::experimental::source_location at compile time
                            
                                When converting to unsigned, the standard says "the least unsigned integer" is the result. Why does "least" matter here?
                            
                                C++ Order of Declaration (in Multi-variable Declaration Line)
                            
                                why memory_order_relaxed performance is the same as memory_order_seq_cst
                            
                                Why can't I create a template function with an optional UnaryPredicate argument?
                            
                                Branchless version of swapping x with y if x > y?
                            
                                Boost asio run vs work (ambiguity) - what's the purpose of the work class?
                            
                                C++17 lambda captures with relaxed type requirements

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why use the Global Offset Table for symbols defined in the shared library itself?

Tags:

c++

symbols

assembly

dynamic-linking

got

yggdrasil

People also ask

1 Answers

Ross Ridge

Recent Activity

Donate For Us