Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to understand this GNU C inline assembly macro for PowerPC stwbrx

This is basically to perform swap for the buffers while transferring a message buffer. This statement left me puzzled (because of my unfamiliarity with the embedded assembly code in c). This is a power pc instruction

#define ASMSWAP32(dest_addr,data) __asm__ volatile ("stwbrx %0, 0, %1" : : "r" (data), "r" (dest_addr))
like image 203
user2927392 Avatar asked Dec 14 '22 19:12

user2927392


2 Answers

Besides being unsafe because of a bug, this macro is also less efficient than what the compiler will generate for you.


stwbrx = store word byte-reversed. The x stands for indexed.

You don't need inline asm for this in GNU C, where you can use __builtin_bswap32 and let the compiler emit this instruction for you.

void swapstore_asm(int a, int *p) {
    ASMSWAP32(p, a);
}

void swapstore_c(int a, int *p) {
    *p = __builtin_bswap32(a);
}

Compiled with gcc4.8.5 -O3 -mregnames, we get identical code from both functions (Godbolt compiler explorer):

swapstore:
    stwbrx %r3, 0, %r4
    blr
swapstore_c:
    stwbrx %r3,0,%r4
    blr

But with a more complicated address (storing to p[off], where off is an integer function arg), the compiler knows how to use both register inputs, while your macro forces the compiler to have the address in a single register:

void swapstore_offset(int a, int *p, int off) {
     = __builtin_bswap32(a);
}

swapstore_offset:
    slwi %r5,%r5,2              # *4 = sizeof(int)
    stwbrx %r3,%r4,%r5          # use an indexed addressing mode, with both registers non-zero
    blr

swapstore_offset_asm:
    slwi %r5,%r5,2
    add %r4,%r4,%r5            # extra instruction forced by using the macro
    stwbrx %r3, 0, %r4
    blr

BTW, if you're having trouble understanding GNU C inline asm templates, looking at the compiler's asm output can be a useful way to see what gets substituted in. See How to remove "noise" from GCC/clang assembly output? for more about reading compiler asm output.


Also note that this macro is buggy: it's missing a "memory" clobber for the store. And yes, you still need that with asm volatile. The compiler doesn't assume that *dest_addr is modified unless you tell it, so it could hoist a non-volatile load of *dest_addr ahead of this insn, or more likely to be a real problem, sink a store after it. (e.g. if you zeroed a buffer before storing to it with this, the compiler might actually zero after this instruction.)

Instead of a "memory" clobber (and also leaving out volatile), you could tell the compiler which memory location you modify with a =m" (*dest_addr) operand, either as a dummy operand or with a constraint on the addressing mode so you could use it as reg+reg. (IDK PPC well enough to know what "=m" usually expands to.)

In most cases this bug won't bite you, but it's still a bug. Upgrading your compiler version or using link-time optimization could maybe make your program buggy with no source-level changes.

This kind of thing is why https://gcc.gnu.org/wiki/DontUseInlineAsm

See also https://stackoverflow.com/tags/inline-assembly/info.

like image 114
Peter Cordes Avatar answered Jan 04 '23 22:01

Peter Cordes


#define ASMSWAP32(dest_addr,data) ...

This part should be clear

__asm__ volatile ( ... : : "r" (data), "r" (dest_addr))

This is the actual inline assembly:

Two values are passed to the assmbly code; no value is returned from the assembly code (this is the colons after the actual assembly code).

Both parameters are passed in registers ("r"). The expression %0 will be replaced by the register that contains the value of data while the expression %1 will be replaced by the register that contains the value of dest_addr (which will be a pointer in this case).

The volatile here means that the assembly code has to be executed at this point and cannot be moved to somewhere else.

So if you use the following code in the C source:

ASMSWAP(&a, b);

... the following assembler code will be generated:

# write the address of a to register 5 (for example)
...
# write the value of b to register 6
...
stwbrx 6, 0, 5

So the first argument of the stwbrx instruction is the value of b and the last argument is the address of a.

stwbrx x, 0, y

This instruction writes the value in register x to the address stored in register y; however it writes the value in "reverse endian" (on a big-endian CPU it writes the value "little endian".

The following code:

uint32 a;
ASMSWAP32(&a, 0x12345678);

... should therefore result in a = 0x78563412.

like image 41
Martin Rosenau Avatar answered Jan 04 '23 22:01

Martin Rosenau