Below is excerpted from the GCC manual's Extended Asm docs, on embedding assembly instructions in C using asm
keyword:
The same problem can occur if one output parameter (a) allows a register constraint and another output parameter (b) allows a memory constraint. The code generated by GCC to access the memory address in b can contain registers which might be shared by a, and GCC considers those registers to be inputs to the asm. As above, GCC assumes that such input registers are consumed before any outputs are written. This assumption may result in incorrect behavior if the asm statement writes to a before using b. Combining the ‘&’ modifier with the register constraint on a ensures that modifying a does not affect the address referenced by b. Otherwise, the location of b is undefined if a is modified before using b.
The italic sentence says there may be "incorrect behavior" if the asm statement writes to a
before using b
.
I cannot figure out how such an "incorrect behavior" could have occurred, so I wish to have a concrete asm code example to demonstrate the "incorrect behavior" so that I could have a deep understanding of this paragraph.
I can perceive the problem when two such asm codes are running in parallel, but the above paragraph does not mention multiprocessing scenario.
If we have only one CPU with one core, can you please show an asm code that may produce such an incorrect behavior, that is, modifying a
affects the address referenced by b
such that the location of b
is undefined.
The only assembly language I am familiar with is Intel x86 assembly, so please make the example targeted on that platform.
Clobbers. A comma-separated list of registers or other values changed by the AssemblerTemplate , beyond those listed as outputs. An empty list is permitted.
The __asm keyword invokes the inline assembler and can appear wherever a C or C++ statement is legal. It cannot appear by itself. It must be followed by an assembly instruction, a group of instructions enclosed in braces, or, at the very least, an empty pair of braces.
In computer programming, an inline assembler is a feature of some compilers that allows low-level code written in assembly language to be embedded within a program, among code that otherwise has been compiled from a higher-level language such as C or Ada.
Now you should get an idea what r(y) is: It is an input operand that reserves a register for the variable y and assigns it to the placeholder %1 (because it is the second operand listed after the inline assembler string).
Consider the following example:
extern int* foo();
int bar()
{
int r;
__asm__(
"mov $0, %0 \n\t"
"add %1, %0"
: "=r" (r) : "m" (*foo()));
return r;
}
The usual calling convention puts return values into the eax
register. As such, there is a good chance the compiler decides to use eax
throughout, to avoid unnecessary copying. The generated assembly may look like:
subl $12, %esp
call foo
mov $0, %eax
add (%eax), %eax
addl $12, %esp
ret
Notice that the mov $0, %eax
zeroes eax
before the next instruction attempts to use it for referencing the input argument, hence this code will crash. With early clobber, you force the compiler to pick different registers. In my case, the resulting code was:
subl $12, %esp
call foo
mov $0, %edx
add (%eax), %edx
addl $12, %esp
movl %edx, %eax
ret
The compiler could have instead moved the result of foo()
into edx
(or any other free register), like this:
subl $12, %esp
call foo
mov %eax, %edx
mov $0, %eax
add (%edx), %eax
addl $12, %esp
ret
This example used the memory constraint for an input argument, but the concept applies equally to outputs too.
Given the code below, Apple Clang 11 with -O3
uses (%rax)
for the a
and %eax
for b
.
void foo(int *a)
{
__asm__(
"nop # a is %[a].\n"
"nop # b is %[b].\n"
"nop # c is %[c].\n"
"nop # d is %[d].\n"
"nop # e is %[e].\n"
"nop # f is %[f].\n"
"nop # g is %[g].\n"
"nop # h is %[h].\n"
"nop # i is %[i].\n"
"nop # j is %[j].\n"
"nop # k is %[k].\n"
"nop # l is %[l].\n"
"nop # m is %[m].\n"
"nop # n is %[n].\n"
"nop # o is %[o].\n"
:
[a] "=m" (a[ 0]),
[b] "=r" (a[ 1]),
[c] "=r" (a[ 2]),
[d] "=r" (a[ 3]),
[e] "=r" (a[ 4]),
[f] "=r" (a[ 5]),
[g] "=r" (a[ 6]),
[h] "=r" (a[ 7]),
[i] "=r" (a[ 8]),
[j] "=r" (a[ 9]),
[k] "=r" (a[10]),
[l] "=r" (a[11]),
[m] "=r" (a[12]),
[n] "=r" (a[13]),
[o] "=r" (a[14])
);
}
So, if the nop
instructions and comments were replaced with actual instructions that wrote to %[b]
before %[a]
, they would destroy the address needed for %[a]
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With