Let N
be an a compile time unsigned integer.
GCC can optimize
unsigned sum = 0;
for(unsigned i=0; i<N; i++) sum += a; // a is an unsigned integer
to simply a*N
. This can be understood since modular arithmetic says (a%k + b%k)%k = (a+b)%k
.
However GCC will not optimize
float sum = 0;
for(unsigned i=0; i<N; i++) sum += a; // a is a float
to a*(float)N
.
But by using associative math with e.g. -Ofast
I discovered that GCC can reduce this in order log2(N)
steps. E.g for N=8
it can do the sum in three additions.
sum = a + a
sum = sum + sum // (a + a) + (a + a)
sum = sum + sum // ((a + a) + (a + a)) + ((a + a) + (a + a))
Though some point after N=16
GCC goes back to doing N-1
sums.
My question is why does GCC not do a*(float)N
with -Ofast
?
Instead of being O(N)
or O(Log(N))
it could be simply O(1)
. Since N
is known at compile time it's possible to determine if N
fits in a float. And even if N
is too large for a float it could do sum =a*(float)(N & 0x0000ffff) + a*(float)(N & ffff0000)
. In fact, I did a little test to check the accuracy and a*(float)N
is more accurate anyway (see the code and results below).
//gcc -O3 foo.c
//don't use -Ofast or -ffast-math or -fassociative-math
#include <stdio.h>
float sumf(float a, int n)
{
float sum = 0;
for(int i=0; i<n; i++) sum += a;
return sum;
}
float sumf_kahan(float a, int n)
{
float sum = 0;
float c = 0;
for(int i=0; i<n; i++) {
float y = a - c;
float t = sum + y;
c = (t -sum) - y;
sum = t;
}
return sum;
}
float mulf(float a, int n)
{
return a*n;
}
int main(void)
{
int n = 1<<24;
float a = 3.14159;
float t1 = sumf(a,n);
float t2 = sumf_kahan(a,n);
float t3 = mulf(a,n);
printf("%f %f %f\n",t1,t2,t3);
}
The result is 61848396.000000 52707136.000000 52707136.000000
which shows that multiplication and the Kahan summation have the same result which I think shows that the multiplication is more accurate than the simple sum.
An integer multiplication always needs a "carry propagate add" step at the end. Consequently, addition is always faster because that's the final step of a multiplication. (Floating point is a little different, but not significantly so).
Multiplication is faster than division.
A floating point multiplication between two numbers and can be expressed as. Thus it can be said that in a floating point multiplication, mantissas are multiplied and exponents are added. The major steps for a floating point division are. Extract the sign of the result from the two sign bits.
Floating-point addition, subtraction, and multiplication of integral values will be exact as long as the inputs are exact and the results are small enough.
There are some fundamental difference between
float funct( int N, float sum )
{
float value = 10.0;
for( i = 0; i < N ;i ++ ) {
sum += value;
}
return sum;
}
and
float funct( int N, float sum )
{
float value = 10.0;
sum += value * N;
return sum;
}
When the sum approaches FLT_EPSILON * larger than value, the repeated sum tends towards a no-op. So any large value of N, would result in no change to sum for repeated addition. For the multiplication choice, the result (value * N) needs to be FLT_EPSILON * smaller than sum for the operation to have a no-op.
So the compiler can't make the optimization, because it can't tell if you wanted the exact behavior (where multiply is better), or the implemented behavior, where the scale of sum affects the result of the addition.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With