Concrete example of incorrect behavior of an early-clobber affecting a memory operand's addressing mode in GCC inline asm?

Tags:

Below is excerpted from the GCC manual's Extended Asm docs, on embedding assembly instructions in C using asm keyword:

The same problem can occur if one output parameter (a) allows a register constraint and another output parameter (b) allows a memory constraint. The code generated by GCC to access the memory address in b can contain registers which might be shared by a, and GCC considers those registers to be inputs to the asm. As above, GCC assumes that such input registers are consumed before any outputs are written. This assumption may result in incorrect behavior if the asm statement writes to a before using b. Combining the ‘&’ modifier with the register constraint on a ensures that modifying a does not affect the address referenced by b. Otherwise, the location of b is undefined if a is modified before using b.

The italic sentence says there may be "incorrect behavior" if the asm statement writes to a before using b.

I cannot figure out how such an "incorrect behavior" could have occurred, so I wish to have a concrete asm code example to demonstrate the "incorrect behavior" so that I could have a deep understanding of this paragraph.

I can perceive the problem when two such asm codes are running in parallel, but the above paragraph does not mention multiprocessing scenario.

If we have only one CPU with one core, can you please show an asm code that may produce such an incorrect behavior, that is, modifying a affects the address referenced by b such that the location of b is undefined.

The only assembly language I am familiar with is Intel x86 assembly, so please make the example targeted on that platform.

340

asked May 22 '21 00:05

zzzhhh

2 Answers

Consider the following example:

extern int* foo();
int bar()
{
    int r;

    __asm__(
        "mov $0, %0 \n\t"
        "add %1, %0"
    : "=r" (r) : "m" (*foo()));

    return r;
}

The usual calling convention puts return values into the eax register. As such, there is a good chance the compiler decides to use eax throughout, to avoid unnecessary copying. The generated assembly may look like:

        subl    $12, %esp
        call    foo
        mov $0, %eax
        add (%eax), %eax
        addl    $12, %esp
        ret

Notice that the mov $0, %eax zeroes eax before the next instruction attempts to use it for referencing the input argument, hence this code will crash. With early clobber, you force the compiler to pick different registers. In my case, the resulting code was:

        subl    $12, %esp
        call    foo
        mov $0, %edx
        add (%eax), %edx
        addl    $12, %esp
        movl    %edx, %eax
        ret

The compiler could have instead moved the result of foo() into edx (or any other free register), like this:

        subl    $12, %esp
        call    foo
        mov     %eax, %edx
        mov $0, %eax
        add (%edx), %eax
        addl    $12, %esp
        ret

This example used the memory constraint for an input argument, but the concept applies equally to outputs too.

149

answered Oct 21 '22 13:10

Jester

Given the code below, Apple Clang 11 with -O3 uses (%rax) for the a and %eax for b.

void foo(int *a)
{
    __asm__(
            "nop    # a is %[a].\n"
            "nop    # b is %[b].\n"
            "nop    # c is %[c].\n"
            "nop    # d is %[d].\n"
            "nop    # e is %[e].\n"
            "nop    # f is %[f].\n"
            "nop    # g is %[g].\n"
            "nop    # h is %[h].\n"
            "nop    # i is %[i].\n"
            "nop    # j is %[j].\n"
            "nop    # k is %[k].\n"
            "nop    # l is %[l].\n"
            "nop    # m is %[m].\n"
            "nop    # n is %[n].\n"
            "nop    # o is %[o].\n"
        :
            [a] "=m" (a[ 0]),
            [b] "=r" (a[ 1]),
            [c] "=r" (a[ 2]),
            [d] "=r" (a[ 3]),
            [e] "=r" (a[ 4]),
            [f] "=r" (a[ 5]),
            [g] "=r" (a[ 6]),
            [h] "=r" (a[ 7]),
            [i] "=r" (a[ 8]),
            [j] "=r" (a[ 9]),
            [k] "=r" (a[10]),
            [l] "=r" (a[11]),
            [m] "=r" (a[12]),
            [n] "=r" (a[13]),
            [o] "=r" (a[14])
        );
}

So, if the nop instructions and comments were replaced with actual instructions that wrote to %[b] before %[a], they would destroy the address needed for %[a].

answered Oct 21 '22 13:10

Eric Postpischil

Related questions
                            
                                Debugging disassembled libraries with gdb
                            
                                Load constant floats into SSE registers
                            
                                Fast intersection of two sorted integer arrays
                            
                                Is there a way to output the assembly of a single function in isolation?
                            
                                Disk Read Error while loading sectors into memory
                            
                                writing functions in assembler
                            
                                Are PUSH/POP instructions considered RISC or CISC?
                            
                                What is a Partial Flag Stall?
                            
                                Does the REX.B override work with the MOVSS instruction?
                            
                                What does the hash (#) value associated with the ARM LDR instruction mean?
                            
                                Porting compiler from x86 Assembly to LLVM
                            
                                Writing x86_64 linux kernel module in assembler
                            
                                Is vxorps-zeroing on AMD Jaguar/Bulldozer/Zen faster with xmm registers than ymm?
                            
                                CS:APP example uses idivq with two operands?
                            
                                Loading raw code from C program
                            
                                Harsh differences in generated assembly of floating-point comparisons < and >=
                            
                                Force visual studio to always 'rebuild all' when debugging
                            
                                Why does the sys_read system call end when it detects a new line?
                            
                                128-bit values - From XMM registers to General Purpose
                            
                                Why does DOS set the SP register to 0xFFFE after loading a .COM file?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Concrete example of incorrect behavior of an early-clobber affecting a memory operand's addressing mode in GCC inline asm?

Tags:

x86

gcc

assembly

inline-assembly

register-allocation

zzzhhh

People also ask

2 Answers

Jester

Eric Postpischil

Recent Activity

Donate For Us