I've noticed a really weird behavior when I was playing with libc's system() function on x86-64 linux, sometimes the call to system() fails with a segmentation fault, here's what I got after debugging it with gdb.
I've noticed that the segmentation fault is cased in this line:
=> 0x7ffff7a332f6 <do_system+1094>: movaps XMMWORD PTR [rsp+0x40],xmm0
According to the manual, this is the cause of the SIGSEGV:
When the source or destination operand is a memory operand, the operand must be aligned on a 16-byte boundary or a general-protection exception (#GP) is generated.
Looking deeper down, I've noticed that indeed my rsp value was not 16 byte padded (that is, its hex representation didn't end with 0). Manually modifying the rsp right before the call to system actually makes everything work.
So I've written the following program:
#include <stdio.h>
#include <stdlib.h>
int main(void) {
    register long long int sp asm ("rsp");
    printf("%llx\n", sp);
    if (sp & 0x8) /* == 0x8*/
    { 
        printf("running system...\n");
        system("touch hi");
    } 
    return 0;
}
Compiled with gcc 7.3.0 And sure enough, when observing the output:
sha@sha-desktop:~/Desktop/tda$ ltrace -f ./o_sample2
[pid 26770] printf("%llx\n", 0x7ffe3eabe6c87ffe3eabe6c8
)                                           = 13
[pid 26770] puts("running system..."running system...
)                                                  = 18
[pid 26770] system("touch hi" <no return ...>
[pid 26771] --- SIGSEGV (Segmentation fault) ---
[pid 26771] +++ killed by SIGSEGV +++
[pid 26770] --- SIGCHLD (Child exited) ---
[pid 26770] <... system resumed> )           = 139
[pid 26770] +++ exited (status 0) +++
So with this program, I cannot execute system() what so ever.
Small thing also, and I cannot tell if its relevant to the problem, almost all of my runs end up with a bad rsp value and a child that is killed by SEGSEGV.
This makes me wonder a few things:
system mess around with the xmms registers? system() function properly?Thanks in advance
The x86-64 System V ABI guarantees 16-byte stack alignment before a call, so libc system is allowed to take advantage of that for 16-byte aligned loads/stores.  If you break the ABI, it's your problem if things crash.
On entry to a function, after a call has pushed a return address, RSP+-8 is 16-byte aligned, and one more push will set you up to call another function.
GCC of course normally has no problem doing this, by using either an odd number of pushes or using a sub rsp, 16*n + 8 to reserve stack space.  Using a register-asm local variable with asm("rsp") doesn't break this, as long as you only read the variable, not assign to it.
You say you're using GCC7.3.  I put your code on the Godbolt compiler explorer and compiled it with -O3, -O2, -O1, and -O0.  It follows the ABI at all optimization levels, making a main that starts with sub rsp, 8 and doesn't modify RSP inside the function (except for call), until the end of the function.
So does every other version and optimization level of clang and gcc I checked.
This is gcc7.3 -O3's code-gen: note that it does not do anything to RSP except read it inside the function body, so if main is called with a valid RSP (16-byte aligned - 8), all of main's function calls will also be made with 16-byte aligned RSP.  (And it will never find sp & 8 true, so it will never call system in the first place.)
# gcc7.3 -O3
main:
        sub     rsp, 8
        xor     eax, eax
        mov     edi, OFFSET FLAT:.LC0
        mov     rsi, rsp          # read RSP.
        call    printf
        test    spl, 8            # low 8 bits of RSP
        je      .L2
        mov     edi, OFFSET FLAT:.LC1
        call    puts
        mov     edi, OFFSET FLAT:.LC2
        call    system
.L2:
        xor     eax, eax
        add     rsp, 8
        ret
If you're calling main in some non-standard way, you're violating the ABI.  And you don't explain it in the question, so this is not a MCVE.
As I explained in Does the C++ standard allow for an uninitialized bool to crash a program?, compilers are allowed to emit code that takes advantage of any guarantees the ABI of the target platform makes.  This includes using movaps for 16-byte loads/stores to copy stuff around on the stack, taking advantage of the incoming alignment guarantee.
It's a missed optimization that gcc doesn't optimize away the if() entirely, like clang does.
But clang's really treating it as an uninitialized variable; without using it in an asm statement, so the register-local asm("rsp") is not having any effect for clang, I think.  Clang leaves RSI unmodified before the first printf call, so clang's main actually prints argv, never reading RSP at all.
Clang is allowed to do this: the only supported use for register-asm local vars is making "r"(var) extended-asm constraints pick the register you want. (https://gcc.gnu.org/onlinedocs/gcc/Local-Register-Variables.html).
The manual doesn't imply that simply using such a variable other times can be problematic, so I think this code should be safe in general according to the written rules, as well as happening to work in practice.
The manual does say that using a call-clobbered register (like "rcx" on x86) would lead to the variable being clobbered by function calls, so perhaps a variable using rsp would be affected by compiler-generated push/pop?
This is an interesting test-case: see it on the Godbolt link.
// gcc won't compile this: "error: unable to find a register to spill"
// clang simply copies the value back out of RDX before idiv
int sink;
int divide(int a, int b) {
    register long long int dx asm ("rdx") = b;
    asm("" : "+r"(dx));  // actually make the compiler put the value in RDX
    sink = a/b;   // IDIV uses EDX as an input
    return dx;
}
Without the asm("" : "+r"(dx));, gcc compiles it just fine, never putting b into RDX at all.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With