Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to read a condition flag in ARMv7 Thumb-2 assembly?

I'm using an ARMv7 processor with Thumb-2 instructions.

I have executed an ADD, SUB or a CMP. Now I want to move the condition flag LE to r2. After this, r2 should contain either 0 or 1.

I've been looking through the Thumb-2 manual, but I haven't found a conditional MOV instruction or a special instruction to read the flags.

What's the most efficient way to do this? Thanks in advance!

like image 569
Jesbus Avatar asked Jan 01 '23 11:01

Jesbus


2 Answers

You need to start a conditional block with an ite (if-then-else) instruction and then just use conditional assignments:

ite le        @ if-then-else (le)
movle r2, #1  @ if (le) then r2 = #1
movgt r2, #0  @         else r2 = #0

In general, you can use arbitrary conditional instructions in Thumb-2 if you prefix them with appropriate IT-instructions. Read the manual for details.

like image 98
fuz Avatar answered Jan 04 '23 01:01

fuz


In ARM, (almost) any instruction can be predicated. In thumb mode, that requires an it instruction to encode the predicate and pattern of negated or not for the next few instructions.

But in unified syntax the assembler can do that for you, without an explict it, I think.

e.g. movle r0, #1 sets r0 = 1 if the LE condition is true in flags, otherwise leaving it unchanged. So you'd need a mov r0, #0 first.

ARM32 doesn't have a set-from-condition instruction like x86's setcc.

AArch64 does: turning a flag condition into an integer only takes a single cset instruction.

This C source:

int booleanize(int x, int y) { return x<y; }
int booleanize_u(unsigned a, unsigned b) { return a<b; }

compiles for ARM32 thumb with clang -O3 (on the Godbolt compiler explorer), revealing some stupid missed optimizations. gcc is similar, making branchy code with no -mcpu or even worse than clang with -mcpu=cortex-a53. Branchy is maybe not totally unreasonable on a simple microcontroller.

@@ BAD EXAMPLE, compiler missed optimizations

@ clang7.0 -target arm -mthumb -mcpu=cortex-a53
booleanize(int, int):
    movs    r2, #0         @ movs is 16-bit, mov is a 32-bit instruction, I think.
    cmp     r0, r1
    it      lt
    movlt   r2, #1
    mov     r0, r2         @ wasted instruction because the compiler wanted to mov #0 before cmp
    bx      lr

booleanize_u(unsigned int, unsigned int):
    movs    r2, #0
    cmp     r0, r1
    it      lo
    movlo   r2, #1
    mov     r0, r2
    bx      lr

This is pretty definitely worse than ite le / movle / movgt from @fuz's answer, with 2 predicated instructions.

ARM-mode code-gen is more or less fine, where every 32-bit instruction word has 4 bits in the encoding for a predicate condition. (The default with no suffix in the asm source is al = always.)

@ gcc8.2 -O3 -mcpu=cortex-a53
booleanize(int, int):
    cmp     r0, r1
    movge   r0, #0     @ a simple mov without predication or flag-setting would work
    movlt   r0, #1
    bx      lr

booleanize_u(unsigned int, unsigned int):
    cmp     r0, r1
    movcs   r0, #0
    movcc   r0, #1
    bx      lr

AArch64 has cset, booleanization in a can.

@ clang and gcc make the same efficient code
booleanize(int, int):
    cmp     w0, w1
    cset    w0, lt            @ signed less-than
    ret
booleanize_u(unsigned int, unsigned int):
    cmp     w0, w1
    cset    w0, lo            @ unsigned lower
    ret
like image 29
Peter Cordes Avatar answered Jan 04 '23 01:01

Peter Cordes