Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Algebraic reductions of signed integer expressions in C/C++

I wanted to see if GCC would reduce a - (b - c) to (a + c) - b with signed and unsigned integers so I created two tests

//test1.c
unsigned fooau(unsigned a, unsigned b, unsigned c) { return a - (b - c); }
signed   fooas(signed   a, signed   b, signed   c) { return a - (b - c); }
signed   fooms(signed   a) { return a*a*a*a*a*a; }
unsigned foomu(unsigned a) { return a*a*a*a*a*a; }  

//test2.c
unsigned fooau(unsigned a, unsigned b, unsigned c) { return (a + c) - b; }
signed   fooas(signed   a, signed   b, signed   c) { return (a + c) - b; }
signed   fooms(signed   a) { return (a*a*a)*(a*a*a); }
unsigned foomu(unsigned a) { return (a*a*a)*(a*a*a); }

I compiled first with gcc -O3 test1.c test2.c -S and looked at the assembly. For both tests fooau were identical however fooas was not.

As far as I understand unsigned arithmetic can be derived from the following formula

(a%n + b%n)%n = (a+b)%n

which can be used to show that unsigned arithmetic is associative. But since signed overflow is undefined behavior this equality does not necessarily hold for signed addition (i.e. signed addition is not associative) which explains why GCC did not reduce a - (b - c) to (a + c) - b for signed integers. But we can tell GCC to use this formula using -fwrapv. Using this option fooas for both tests is identical.

But what about multiplication? For both tests fooms and foomu were simplified to three multiplications (a*a*a*a*a*a to (a*a*a)*(a*a*a)). But multiplication can be written as repeated addition so using the formula above I think it can be shown that

((a%n)*(b%n))%n = (a*b)%n

which I think can also show that unsigned modular multiplication is associative as well. But since GCC used only three multiplications for foomu this shows that GCC assumes signed integer multiplication is associative.

This seems like a contradiction to me. For addition signed arithmetic was not associative but for multiplication it is.

Two questions:

  1. Is it true that addition is not associative with signed integers but multiplication is in C/C++?

  2. If signed overflow is used for optimization isn't the fact that GCC not reducing the algebraic expression a failure to optimize? Wouldn't it better better for optimization to use -fwrapv (I understand that a - (b - c) to (a + c) - b is not much of a reduction but I'm worried about more complicated cases)? Does this mean for optimization sometimes using -fwrapv is more efficient and sometimes it's not?

like image 722
Z boson Avatar asked May 21 '15 12:05

Z boson


Video Answer


2 Answers

  1. No, multiplication is not associative in signed integers. Consider (0 * x) * x vs. 0 * (x * x) - the latter has potentially undefined behavior while the former is always defined.

  2. The potential for undefined behavior only ever introduces new optimization opportunities, the classic example being optimizing x + 1 > x to true for signed x, an optimization that is not available for unsigned integers.

I don't think you can assume that gcc failing to change a - (b - c) to (a + c) - b represents a missed optimization opportunity; the two calculations compile to the same two instructions on x86-64 (leal and subl), just in a different order.

Indeed, the implementation is entitled to assume that arithmetic is associative, and use that for optimizations, since anything can happen on UB including modulo arithmetic or infinite-range arithmetic. However, you as the programmer are not entitled to assume associativity unless you can guarantee that no intermediate result overflows.

As another example, try (a + a) - a - gcc will optimize this to a for signed a as well as for unsigned.

like image 97
ecatmur Avatar answered Oct 06 '22 11:10

ecatmur


Algebraic reduction of signed integer expressions can be performed provided it has the same result for any defined set of inputs. So if the expression

a * a * a * a * a * a

is defined -- that is, a is small enough that no signed overflow occurs during the computation -- then any regrouping of the multiplications will produce the same value, because no product of less than six as can overflow.

The same would be true for a + a + a + a + a + a.

Things change if the variables multiplied (or added) are not all the same, or if the additions are intermingled with subtractions. In those cases, regrouping and rearranging the computation could lead to a signed overflow which did not occur in the canonical computation.

For example, take the expression

a - (b - c)

Algebraically, that's equivalent to

(a + c) - b

But the compiler can not do that rearrangement because it is possible that the intermediate value a+c will overflow with inputs which would not cause an overflow in the original. Suppose we had a=INT_MAX-1; b=1; c=2; then a+c results in an overflow, but a - (b - c) is computed as a - (-1), which is INT_MAX, without overflow.

If the compiler can assume that signed overflow does not trap but instead is computed modulo INT_MAX+1, then these rearrangements are possible. The -fwrapv options allows gcc to make that assumption.

like image 40
rici Avatar answered Oct 06 '22 13:10

rici