I am in need of such a inline assembly code:
like this:
__asm__ __volatile__ ("push %%eax\n\t"
// ... some operations that use ECX as a temporary
"mov %0, %%ecx\n\t"
// ... some other operation
"pop %%eax"
: : "m"(foo));
// foo is my local variable, that is to say, on stack
When disassembling the compiled code, the compiler give the memory address like 0xc(%esp)
, it is relative to esp
, hence, this fragment of code will not works correctly since I have a push
operation before mov
.
Therefore, how can I tell the compile I do not like the foo
relative to esp
, but any thing like -8(%ebp)
relative to ebp.
P.S. You may suggest that I can put eax
inside the Clobbers, but it is just a sample code. I don't like to show the full reason why I don't accept this solution.
Modifying ESP inside inline-asm should generally be avoided when you have any memory inputs / outputs, so you don't have to disable optimizations or force the compiler to make a stack-frame with EBP some other way. One major advantage is that you (or the compiler) can then use EBP as an extra free register; potentially a significant speedup if you're already having to spill/reload stuff. If you're writing inline asm, presumably this is a hotspot so it's worth spending the extra code-size to use ESP-relative addressing modes.
In x86-64 code, there's an added obstacle to using push/pop safely, because you can't tell the compiler you want to clobber the red-zone below RSP. (You can compile with -mno-red-zone
, but there's no way to disable it from the C source.) You can get problems like this where you clobber the compiler's data on the stack. No 32-bit x86 ABI has a red-zone, though, so this only applies to x86-64 System V. (Or non-x86 ISAs with a red-zone.)
You only need -fno-omit-frame-pointer
for that function if you want to do asm-only stuff like push
as a stack data structure, so there's a variable amount of push. Or maybe if optimizing for code-size.
You can always write a whole non-inline function in asm and put it in a separate file, then you have full control. But only do that if your function is large enough to be worth the call/ret overhead, e.g. if it includes a whole loop; don't make the compiler call
a short non-looping function inside a C inner loop, destroying all the call-clobbered registers and having to make sure globals are in sync.
It seems you're using push
/ pop
inside inline asm because you don't have enough registers, and need to save/reload something. You don't need to use push/pop for save/restore. Instead, use dummy output operands with "=m"
constraints to get the compiler to allocate stack space for you, and use mov
to/from those slots. (Of course you're not limited to mov
; it can be a win to use a memory source operand for an ALU instruction if you only need the value once or twice.)
This may be slightly worse for code-size, but is usually not worse for performance (and can be better). If that's not good enough, write the whole function (or the whole loop) in asm so you don't have to wrestle with the compiler.
int foo(char *p, int a, int b) {
int t1,t2; // dummy output spill slots
int r1,r2; // dummy output tmp registers
int res;
asm ("# operands: %0 %1 %2 %3 %4 %5 %6 %7 %8\n\t"
"imull $123, %[b], %[res]\n\t"
"mov %[res], %[spill1]\n\t"
"mov %[a], %%ecx\n\t"
"mov %[b], %[tmp1]\n\t" // let the compiler allocate tmp regs, unless you need specific regs e.g. for a shift count
"mov %[spill1], %[res]\n\t"
: [res] "=&r" (res),
[tmp1] "=&r" (r1), [tmp2] "=&r" (r2), // early-clobber
[spill1] "=m" (t1), [spill2] "=&rm" (t2) // allow spilling to a register if there are spare regs
, [p] "+&r" (p)
, "+m" (*(char (*)[]) p) // dummy in/output instead of memory clobber
: [a] "rmi" (a), [b] "rm" (b) // a can be an immediate, but b can't
: "ecx"
);
return res;
// p unused in the rest of the function
// so it's really just an input to the asm,
// which the asm is allowed to destroy
}
This compiles to the following asm with gcc7.3 -O3 -m32
on the Godbolt compiler explorer. Note the asm-comment showing what the compiler picked for all the template operands: it picked 12(%esp)
for %[spill1]
and %edi
for %[spill2]
(because I used "=&rm"
for that operand, so the compiler saved/restore %edi
outside the asm, and gave it to us for that dummy operand).
foo(char*, int, int):
pushl %ebp
pushl %edi
pushl %esi
pushl %ebx
subl $16, %esp
movl 36(%esp), %edx
movl %edx, %ebp
#APP
# 19 "/tmp/compiler-explorer-compiler118120-55-w92ge8.v797i/example.cpp" 1
# operands: %eax %ebx %esi 12(%esp) %edi %ebp (%edx) 40(%esp) 44(%esp)
imull $123, 44(%esp), %eax
mov %eax, 12(%esp)
mov 40(%esp), %ecx
mov 44(%esp), %ebx
mov 12(%esp), %eax
# 0 "" 2
#NO_APP
addl $16, %esp
popl %ebx
popl %esi
popl %edi
popl %ebp
ret
Hmm, the dummy memory operand to tell the compiler which memory we modify seems to have resulted in dedicating a register to that, I guess because the p
operand is early-clobber so it can't use the same register. I guess you could risk leaving off the early-clobber if you're confident none of the other inputs will use the same register as p
. (i.e. that they don't have the same value).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With