Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

"Anomaly" in signed integer in C

I'm currently writing a lecture on ARM optimization, specifically on vector machines such as NEON as the final target.

And since vector machines don't fare well with if-else slaloms, I'm trying to demonstrate how to get rid of them by bit-hacking.

I picked the "saturating absolute" function as an example for this. It's practically an ABS routine with the added functionality of capping the result at 0x7fffffff.

The biggest possible negative 32bit number is 0x80000000, and it's a very dangerous thing because val = -val; returns the same 0x80000000 as the initial value, caused by the asymmetry in the two's complement system especially for DSP operations, and thus, it has to be filtered out, mostly by "saturating".

int32_t satAbs1(int32_t val)
{
    if (val < 0) val = -val;
    if (val < 0) val = 0x7fffffff;
    return val;
}

Below is what I would write in assembly:

cmp     r0, #0
rsblts  r0, r0, #0
mvnlt   r0, #0x80000000
bx      lr

And below is what I actually get for the C code above:

satAbs1
        0x00000000:    CMP      r0,#0
        0x00000004:    RSBLT    r0,r0,#0
        0x00000008:    BX       lr

WTH? The compiler simply discarded the saturating part altogether!

The compiler seems to be ruling out val being negative after the first if statement which isn't true if it was 0x80000000

Or maybe the function should return an unsigned value?

uint32_t satAbs2(int32_t val)
{
    uint32_t result;
    if (val < 0) result = (uint32_t) -val; else result = (uint32_t) val;
    if (result == 0x80000000) result = 0x7fffffff;
    return result;
}

satAbs2
        0x0000000C:    CMP      r0,#0
        0x00000010:    RSBLT    r0,r0,#0
        0x00000014:    BX       lr

Unfortunately, it generates the exact same machine codes as the signed version: no saturation.

Again, the compiler seems to rule out the case of val being 0x80000000

Ok, let's widen the range of the second if statement:

uint32_t satAbs3(int32_t val)
{
    uint32_t result;
    if (val < 0) result = (uint32_t) -val; else result = (uint32_t) val;
    if (result >= 0x80000000) result = 0x7fffffff;
    return result;
}

satAbs3
        0x00000018:    CMP      r0,#0
        0x0000001C:    RSBLT    r0,r0,#0
        0x00000020:    CMP      r0,#0
        0x00000024:    MVNLT    r0,#0x80000000
        0x00000028:    BX       lr

Finally, the compiler seems to be doing it's job, albeit sup-optimal (an unnecessary CMP compared to the assembly version)

I can live with the compilers being sub-optimal, but what bothers me is that they are ruling out something that they shouldn't: 0x80000000

I'd even file a bug report to GCC devs on this, but I found out that Clang also rules out the case of the integer being 0x80000000, and thus I suppose I'm missing something regarding to the C standard.

Can anyone tell me where I'm mistaken?

Btw, below is what the if-less bit-hacking version looks like:

int32_t satAbs_bh(int32_t val)
{
    int32_t temp = val ^ (val>>31);
    val = temp + (val>>31);
    val ^= val>>31;
    return val;
}

satAbs_bh
        0x0000002C:    EOR      r3,r0,r0,ASR #31
        0x00000030:    ADD      r0,r3,r0,ASR #31
        0x00000034:    EOR      r0,r0,r0,ASR #31
        0x00000038:    BX       lr

Edit: I agree on this question of mine being a duplicate to some degree.
However, it is way more comprehensive including some assembly level stuff and bitmask technics that might be helpful compared to the referred one.

And below comes a workaround on this problem without mangling the compiler option; rule out the possibility of integer overflow preemptively:

int32_t satAbs4(int32_t val)
{
    if (val == 0x80000000) return 0x7fffffff;
    if (val < 0) val = -val;
    return val;
}
satAbs4
        0x0000002C:    CMP      r0,#0x80000000
        0x00000030:    BEQ      {pc}+0x10 ; 0x40
        0x00000034:    CMP      r0,#0
        0x00000038:    RSBLT    r0,r0,#0
        0x0000003C:    BX       lr
        0x00000040:    MVN      r0,#0x80000000
        0x00000044:    BX       lr

Again, the linaro GCC 7.4.1 I'm using demonstrates its shortcomings: I don't understand the BEQ in line 2. moveq r0, #0x80000001 as suggested in the source code could have saved two instructions at the end.

like image 407
Jake 'Alquimista' LEE Avatar asked Jul 26 '19 12:07

Jake 'Alquimista' LEE


1 Answers

Signed integer overflow or underflow is undefined behavior in C, meaning that you are expected to handle these edge cases yourself. In other words, as soon as the compiler has established that a certain signed integer value is positive, it doesn't care if there is a possibility that it could turn negative through UB.

For example, this code:

int test(int input)
{
    if (input > 0)
        input += 100;

    if (input > 0)
        input += 100;

    if (input > 0)
        input += 100;

    return input;
}

can legally be optimized to this:

int test(int input)
{
    if (input > 0)
        input += 300;

    return input;
}

even though the author of the initial code might have expected that input could overflow between each successive statements.

That's why an optimizing compiler sees your code as something like this:

int32_t satAbs1(int32_t val)
{
    if (val < 0) val = -val;

    // val must be positive here,
    // unless you are relying on UB

    // the following condition is 
    // therefore always false:
    // if (val < 0) val = 0x7fffffff;

    return val;
}

So, the only way to avoid UB is to avoid negating the signed integer if there is a chance that it might invoke UB, i.e.:

int32_t satAbs3_simple(int32_t val)
{
    if (val >= 0) 
        return val;

    // we know that val is negative here, 
    // but unfortunately gcc knows it as well,
    // so we'll handle the edge case explicitly

    if (val == INT32_MIN)
        return INT32_MAX;

    return -val;
}

gcc with -O2 produces code with a branch (early conditional return at bxge):

satAbs3_basic:
  cmp r0, #0
  bxge lr // return r0 if ge #0
  cmp r0, #0x80000000
  rsbne r0, r0, #0
  moveq r0, #0x7FFFFFFF
  bx lr

As @rici mentioned in the comments, if exact-width signed int types from stdint.h (intN_t) are available on your compiler, this means they have to be represented with N bits, no padding, using 2's complement.

This means that you can rewrite the code slightly to use bit masks, which might provide a slightly shorter assembly output (at least with gcc 5 or newer), still without branching:

int32_t satAbs3_c(int32_t val)
{
    uint32_t result = (uint32_t)val;
    if (result & 0x80000000) result = -result; // <-- avoid UB here by negating uint32_t
    if (result == 0x80000000) result = 0x7FFFFFFF;
    return (int32_t)result;
}

Note that an optimizing compiler should theoretically be able to produce this same output for both cases, but anyway, recent gcc versions (with -O1) for the last snippet give:

satAbs3_c:
  cmp r0, #0
  rsblt r0, r0, #0
  cmp r0, #0x80000000
  moveq r0, #0x7FFFFFFF
  bx lr

I actually believe it cannot get shorter than this (apart from the xor bit-hacking), because your initial assembly seems to lack a cmp r0, #0 instruction after rsblts (because rsblts changes r0, and cmp is the part where actual comparison takes place).

like image 152
Groo Avatar answered Oct 29 '22 18:10

Groo