Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

GCC inline assembly with stack operation

I am in need of such a inline assembly code:

  • I have a pair(so, it is balanced) of push/pop operation inside the assembly
  • I also have a variable in memory (so, not register) as input

like this:

__asm__ __volatile__ ("push %%eax\n\t"
        // ... some operations that use ECX as a temporary
        "mov %0, %%ecx\n\t"
        // ... some other operation
        "pop %%eax"
: : "m"(foo));
// foo is my local variable, that is to say, on stack

When disassembling the compiled code, the compiler give the memory address like 0xc(%esp), it is relative to esp, hence, this fragment of code will not works correctly since I have a push operation before mov. Therefore, how can I tell the compile I do not like the foo relative to esp, but any thing like -8(%ebp) relative to ebp.

P.S. You may suggest that I can put eax inside the Clobbers, but it is just a sample code. I don't like to show the full reason why I don't accept this solution.

like image 699
Qiang Avatar asked Jan 03 '23 19:01

Qiang


1 Answers

Modifying ESP inside inline-asm should generally be avoided when you have any memory inputs / outputs, so you don't have to disable optimizations or force the compiler to make a stack-frame with EBP some other way. One major advantage is that you (or the compiler) can then use EBP as an extra free register; potentially a significant speedup if you're already having to spill/reload stuff. If you're writing inline asm, presumably this is a hotspot so it's worth spending the extra code-size to use ESP-relative addressing modes.

In x86-64 code, there's an added obstacle to using push/pop safely, because you can't tell the compiler you want to clobber the red-zone below RSP. (You can compile with -mno-red-zone, but there's no way to disable it from the C source.) You can get problems like this where you clobber the compiler's data on the stack. No 32-bit x86 ABI has a red-zone, though, so this only applies to x86-64 System V. (Or non-x86 ISAs with a red-zone.)

You only need -fno-omit-frame-pointer for that function if you want to do asm-only stuff like push as a stack data structure, so there's a variable amount of push. Or maybe if optimizing for code-size.

You can always write a whole non-inline function in asm and put it in a separate file, then you have full control. But only do that if your function is large enough to be worth the call/ret overhead, e.g. if it includes a whole loop; don't make the compiler call a short non-looping function inside a C inner loop, destroying all the call-clobbered registers and having to make sure globals are in sync.


It seems you're using push / pop inside inline asm because you don't have enough registers, and need to save/reload something. You don't need to use push/pop for save/restore. Instead, use dummy output operands with "=m" constraints to get the compiler to allocate stack space for you, and use mov to/from those slots. (Of course you're not limited to mov; it can be a win to use a memory source operand for an ALU instruction if you only need the value once or twice.)

This may be slightly worse for code-size, but is usually not worse for performance (and can be better). If that's not good enough, write the whole function (or the whole loop) in asm so you don't have to wrestle with the compiler.

int foo(char *p, int a, int b) {
    int t1,t2;  // dummy output spill slots
    int r1,r2;  // dummy output tmp registers
    int res;

    asm ("# operands: %0  %1  %2  %3  %4  %5  %6  %7  %8\n\t"
         "imull  $123, %[b], %[res]\n\t"
         "mov   %[res], %[spill1]\n\t"
         "mov   %[a], %%ecx\n\t"
         "mov   %[b], %[tmp1]\n\t"  // let the compiler allocate tmp regs, unless you need specific regs e.g. for a shift count
         "mov   %[spill1], %[res]\n\t"
    : [res] "=&r" (res),
      [tmp1] "=&r" (r1), [tmp2] "=&r" (r2),  // early-clobber
      [spill1] "=m" (t1), [spill2] "=&rm" (t2)  // allow spilling to a register if there are spare regs
      , [p] "+&r" (p)
      , "+m" (*(char (*)[]) p) // dummy in/output instead of memory clobber
    : [a] "rmi" (a), [b] "rm" (b)  // a can be an immediate, but b can't
    : "ecx"
    );

    return res;

    // p unused in the rest of the function
    // so it's really just an input to the asm,
    // which the asm is allowed to destroy
}

This compiles to the following asm with gcc7.3 -O3 -m32 on the Godbolt compiler explorer. Note the asm-comment showing what the compiler picked for all the template operands: it picked 12(%esp) for %[spill1] and %edi for %[spill2] (because I used "=&rm" for that operand, so the compiler saved/restore %edi outside the asm, and gave it to us for that dummy operand).

foo(char*, int, int):
    pushl   %ebp
    pushl   %edi
    pushl   %esi
    pushl   %ebx
    subl    $16, %esp
    movl    36(%esp), %edx
    movl    %edx, %ebp
#APP
# 19 "/tmp/compiler-explorer-compiler118120-55-w92ge8.v797i/example.cpp" 1
        # operands: %eax  %ebx  %esi  12(%esp)  %edi  %ebp  (%edx)  40(%esp)  44(%esp)
    imull  $123, 44(%esp), %eax
    mov   %eax, 12(%esp)
    mov   40(%esp), %ecx
    mov   44(%esp), %ebx
    mov   12(%esp), %eax

# 0 "" 2
#NO_APP
    addl    $16, %esp
    popl    %ebx
    popl    %esi
    popl    %edi
    popl    %ebp
    ret

Hmm, the dummy memory operand to tell the compiler which memory we modify seems to have resulted in dedicating a register to that, I guess because the p operand is early-clobber so it can't use the same register. I guess you could risk leaving off the early-clobber if you're confident none of the other inputs will use the same register as p. (i.e. that they don't have the same value).

like image 189
Peter Cordes Avatar answered Jan 08 '23 13:01

Peter Cordes