Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why do x86-64 Linux system calls work with 6 registers set?

I'm writing a freestanding program in C that depends only on the Linux kernel.

I studied the relevant manual pages and learned that on x86-64 the Linux system call entry point receives the system call number and six arguments through the seven registers rax, rdi, rsi, rdx, r10, r8, and r9.

Does this mean that every system call accepts six arguments?

I researched the source code of several libc implementations in order to find out how they perform system calls. Interestingly, musl contains two distinct approaches to system calls:

  1. src/internal/x86_64/syscall.s

    This assembly source file defines one __syscall function that moves the system call number and exactly six arguments to the registers defined in the ABI. The generic name of the function hints that it can be used with any system call, despite the fact it always passes six arguments to the kernel.

  2. arch/x86_64/syscall_arch.h

    This C header file defines seven separate __syscallN functions, with N specifying their arity. This suggests that the benefit of passing only the exact number of arguments that the system call requires surpasses the cost of having and maintaining seven nearly identical functions.

So I tried it myself:

long
system_call(long number,
            long _1, long _2, long _3, long _4, long _5, long _6)
{
    long value;

    register long r10 __asm__ ("r10") = _4;
    register long r8  __asm__ ("r8")  = _5;
    register long r9  __asm__ ("r9")  = _6;

    __asm__ volatile ( "syscall"
                     : "=a" (value)
                     : "a" (number), "D" (_1), "S" (_2), "d" (_3), "r" (r10), "r" (r8), "r" (r9)
                     : "rcx", "r11", "cc", "memory");

    return value;
}

int main(void) {
    static const char message[] = "It works!" "\n";

    /* system_call(write, standard_output, ...); */
    system_call(1, 1, message, sizeof message, 0, 0, 0);

    return 0;
}

I ran this program and verified that it does write It works!\n to standard output. This left me with the following questions:

  • Why can I pass more parameters than the system call takes?
  • Is this reasonable, documented behavior?
  • What am I supposed to set the unused registers to?
    • Is 0 okay?
  • What will the kernel do with the registers it doesn't use?
    • Will it ignore them?
  • Is the seven function approach faster by virtue of having less instructions?
    • What happens to the other registers in those functions?
like image 456
Matheus Moreira Avatar asked Aug 13 '17 20:08

Matheus Moreira


1 Answers

System calls accept up to 6 arguments, passed in registers (almost the same registers as the SysV x64 C ABI, with r10 replacing rcx but they are callee preserved in the syscall case), and "extra" arguments are simply ignored.

Some specific answers to your questions below.

The src/internal/x86_64/syscall.s is just a "thunk" which shifts all the all the arguments into the right place. That is, it converts from a C-ABI function which takes the syscall number and 6 more arguments, into a "syscall ABI" function with the same 6 arguments and the syscall number in rax. It works "just fine" for any number of arguments - the additional register movement will simply be ignored by the syscall if those arguments aren't used.

Since in the C-ABI all the argument registers are considered scratch (i.e., caller-save), clobbering them is harmless if you assume this __syscall method is called from C. In fact the kernel makes stronger guarantees about clobbered registers, clobbering only rcx and r11 so assuming the C calling convention is safe but pessimistic. In particular, the code calling __syscall as implemented here will unnecessarily save any argument and scratch registers per the C ABI, despite the kernel's promise to preserve them.

The arch/x86_64/syscall_arch.h file is pretty much the same thing, but in a C header file. Here, you want all seven versions (for zero to six arguments) because modern C compilers will warn or error if you call a function with the wrong number of arguments. So there is no real option to have "one function to rule them all" as in the assembly case. This also has the advantage of doing less work syscalls that take less than 6 arguments.

Your listed questions, answered:

  • Why can I pass more parameters than the system call takes?

Because the calling convention is mostly register-based and caller cleanup. You can always pass more arguments in this situation (including in the C ABI) and the other arguments will simply be ignored by the callee. Since the syscall mechanism is generic at the C and .asm level, there is no real way the compiler can ensure you are passing the right number of arguments - you need to pass the right syscall id and the right number of arguments. If you pass less, the kernel will see garbage, and if you pass more, they will be ignored.

  • Is this reasonable, documented behavior?

Yes, sure - because the whole syscall mechanism is a "generic gate" into the kernel. 99% of the time you aren't going to use that: glibc wraps the vast majority of interesting syscalls in C ABI wrappers with the correct signature so you don't have to worry about. Those are the ways that syscall access happens safely.

  • What am I supposed to set the unused registers to?

You don't set them to anything. If you use the C prototypes arch/x86_64/syscall_arch.h the compiler just takes care of it for you (it doesn't set them to anything) and if you are writing your own asm, you don't set them to anything (and you should assume they are clobbered after the syscall).

  • What will the kernel do with the registers it doesn't use?

It is free to use all the registers it wants, but will adhere to the kernel calling convention which is that on x86-64 all registers other than rax, rcx and r11 are preserved (which is why you see rcx and r11 in the clobber list in the C inline asm).

  • Is the seven function approach faster by virtue of having less instructions?

Yes, but the difference is very small since the reg-reg mov instructions are usually have zero latency and have high throughput (up to 4/cycle) on recent Intel architectures. So moving an extra 6 registers perhaps takes something like 1.5 cycles for a syscall that is usually going to take at least 50 cycles even if it does nothing. So the impact is small, but probably measurable (if you measure very carefully!).

  • What happens to the other registers in those functions?

I'm not sure what you mean exactly, but the other registers can be used just like all GP registers, if the kernel wants to preserve their values (e.g., by pushing them on the stack and then poping them later).

like image 188
BeeOnRope Avatar answered Oct 12 '22 23:10

BeeOnRope