Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

x86_64: Is it possible to "in-line substitute" PLT/GOT references?

I'm not sure what a good subject line for this question is, but here we go:

In order to force code locality/compactness for a critical section of code, I'm looking for a way to call a function in an external (dynamically-loaded) library through a "jump slot" (an ELF R_X86_64_JUMP_SLOT relocation) directly at the call site - what the linker ordinarily puts into PLT / GOT, but have these inlined right at the call site.

If I emulate the call like:

#include <stdio.h>
int main(int argc, char **argv)
{
        asm ("push $1f\n\t"
             "jmp *0f\n\t"
             "0: .quad %P0\n"
             "1:\n\t"
             : : "i"(printf), "D"("Hello, World!\n"));
        return 0;
}

To get the space for a 64bit word, the call itself works (please, no comments about this being lucky coincidence as this breaks certain ABI rules - all these are not subject of this question.

For my case, be worked around/addressed in other ways, I'm trying to keep this example brief).

It creates the following assembly:

0000000000000000 <main>:
0:   bf 00 00 00 00          mov    $0x0,%edi
1: R_X86_64_32  .rodata.str1.1
5:   68 00 00 00 00          pushq  $0x0
6: R_X86_64_32  .text+0x19
a:   ff 24 25 00 00 00 00    jmpq   *0x0
d: R_X86_64_32S .text+0x11
...
11: R_X86_64_64 printf
19:   31 c0                   xor    %eax,%eax
1b:   c3                      retq

But (due to using printf as the immediate, I guess ... ?) the target address here is still that of the PLT hook - the same R_X86_64_64 reloc. Linking the object file against libc into an actual executable results in:

0000000000400428 <printf@plt>:
  400428:       ff 25 92 04 10 00       jmpq   *1049746(%rip)        # 5008c0 <_GLOBAL_OFFSET_TABLE_+0x20>
[ ... ]
0000000000400500 <main>:
  400500:       bf 0c 06 40 00          mov    $0x40060c,%edi
  400505:       68 19 05 40 00          pushq  $0x400519
  40050a:       ff 24 25 11 05 40 00    jmpq   *0x400511
  400511:       [ .quad 400428 ]
  400519:       31 c0                   xorl   %eax, %eax
  40051b:       c3                      retq
[ ... ]
DYNAMIC RELOCATION RECORDS
OFFSET           TYPE              VALUE
[ ... ]
00000000005008c0 R_X86_64_JUMP_SLOT  printf

I.e. this still gives the two-step redirection, first transfer execution to the PLT hook, then jump into the library entry point.

Is there a way how I can instruct the compiler/assembler/linker to - in this example - "inline" the jump slot target at address 0x400511?

I.e. replace the "local" (resolved at program link time by ld) R_X86_64_64 reloc with the "remote" (resolved at program load time by ld.so) R_X86_64_JUMP_SLOT one (and force non-lazy-load for this section of code) ? Maybe linker mapfiles might make this possible - if so, how?

Edit:
To make this clear, the question is about how to achieve this in a dynamically-linked executable / for an external function that's only available in a dynamic library. Yes, it's true static linking resolves this in a simpler way, but:

  • There are systems (like Solaris) where static libraries are generally not shipped by the vendor
  • There are libraries that aren't available as either source code or static versions

Hence static linking is not helpful here :(

Edit2:
I've found that in some architectures (SPARC, noticeably, see section on SPARC relocations in the GNU as manual), GNU is able to create certain types of relocation references for the linker in-place using modifiers. The quoted SPARC one would use %gdop(symbolname) to make the assembler emit instructions to the linker stating "create that relocation right here". Intel's assembler on Itanium knows the @fptr(symbol) link-relocation operator for the same kind of thing (see also section 4 in the Itanium psABI). But does an equivalent mechanism - something to instruct the assembler to emit a specific linker relocation type at a specific position in the code - exist for x86_64?

I've also found that the GNU assembler has a .reloc directive which supposedly is to be used for this purpose; still, if I try:

#include <stdio.h>
int main(int argc, char **argv)
{
        asm ("push %%rax\n\t"
             "lea 1f(%%rip), %%rax\n\t"
             "xchg %%rax, (%rsp)\n\t"
             "jmp *0f\n\t"
             ".reloc 0f, R_X86_64_JUMP_SLOT, printf\n\t"
             "0: .quad 0\n"
             "1:\n\t"
             : : "D"("Hello, World!\n"));
        return 0;
}

I get an error from the linker (note that 7 == R_X86_64_JUMP_SLOT):

error: /tmp/cc6BUEZh.o: unexpected reloc 7 in object file
The assembler creates an object file for which readelf says:
Relocation section '.rela.text.startup' at offset 0x5e8 contains 2 entries:
Offset             Info             Type               Symbol's Value  Symbol's Name + Addend
0000000000000001  000000050000000a R_X86_64_32            0000000000000000 .rodata.str1.1 + 0
0000000000000017  0000000b00000007 R_X86_64_JUMP_SLOT     0000000000000000 printf + 0

This is what I want - but the linker doesn't take it.
The linker does accept just using R_X86_64_64 instead above; doing that creates the same kind of binary as in the first case ... redirecting to printf@plt, not the "resolved" one.

like image 238
FrankH. Avatar asked Jun 01 '12 11:06

FrankH.


2 Answers

This optimization has since been implemented in GCC. It can be enabled with the -fno-plt option and the noplt function attribute:

Do not use the PLT for external function calls in position-independent code. Instead, load the callee address at call sites from the GOT and branch to it. This leads to more efficient code by eliminating PLT stubs and exposing GOT loads to optimizations. On architectures such as 32-bit x86 where PLT stubs expect the GOT pointer in a specific register, this gives more register allocation freedom to the compiler. Lazy binding requires use of the PLT; with -fno-plt all external symbols are resolved at load time.

Alternatively, the function attribute noplt can be used to avoid calls through the PLT for specific external functions.

In position-dependent code, a few targets also convert calls to functions that are marked to not use the PLT to use the GOT instead.

like image 162
Florian Weimer Avatar answered Oct 17 '22 07:10

Florian Weimer


In order to inline the call you would need a code (.text) relocation whose result is the final address of the function in the dynamically loaded shared library. No such relocation exists (and modern static linkers don't allow them) on x86_64 using a GNU toolchain for GNU/Linux, therefore you cannot inline the entire call as you wish to do.

The closest you can get is a direct call through the GOT (avoids PLT):

    .section    .rodata
.LC0:
    .string "Hello, World!\n"
    .text
    .globl  main
    .type   main, @function
main:
    pushq   %rbp
    movq    %rsp, %rbp
    movl    $.LC0, %eax
    movq    %rax, %rdi
    call    *printf@GOTPCREL(%rip)
    nop
    popq    %rbp
    ret
    .size   main, .-main

This should generate a R_X86_64_GLOB_DAT relocation against printf in the GOT to be used by the sequence above. You need to avoid C code because in general the compiler may use any number of caller-saved registers in the prologue and epilogue, and this forces you to save and restore all such registers around the asm function call or risk corrupting those registers for later use in the wrapper function. Therefore it is easier to write the wrapper in pure assembly.

Another option is to compile with -Wl,-z,now -Wl,-z,relro which ensures the PLT and PLT-related GOT entries are resolved at startup to increase code locality and compactness. With full RELRO you'll only have to run code in the PLT and access data in the GOT, two things which should already be somewhere in the cache hierarchy of the logical core. If full RELRO is enough to meet your needs then you wouldn't need wrappers and you would have added security benefits.

The best options are really static linking or LTO if they are available to you.

like image 25
Carlos O'Donell Avatar answered Oct 17 '22 05:10

Carlos O'Donell