floating point multiplication vs repeated addition

Tags:

Let N be an a compile time unsigned integer.

GCC can optimize

unsigned sum = 0;
for(unsigned i=0; i<N; i++) sum += a; // a is an unsigned integer

to simply a*N. This can be understood since modular arithmetic says (a%k + b%k)%k = (a+b)%k.

However GCC will not optimize

float sum = 0;
for(unsigned i=0; i<N; i++) sum += a;  // a is a float

to a*(float)N.

But by using associative math with e.g. -Ofast I discovered that GCC can reduce this in order log2(N) steps. E.g for N=8 it can do the sum in three additions.

sum = a + a
sum = sum + sum // (a + a) + (a + a)
sum = sum + sum // ((a + a) + (a + a)) + ((a + a) + (a + a))

Though some point after N=16 GCC goes back to doing N-1 sums.

My question is why does GCC not do a*(float)N with -Ofast?

Instead of being O(N) or O(Log(N)) it could be simply O(1). Since N is known at compile time it's possible to determine if N fits in a float. And even if N is too large for a float it could do sum =a*(float)(N & 0x0000ffff) + a*(float)(N & ffff0000). In fact, I did a little test to check the accuracy and a*(float)N is more accurate anyway (see the code and results below).

//gcc -O3 foo.c
//don't use -Ofast or -ffast-math or -fassociative-math
#include <stdio.h>   
float sumf(float a, int n)
{
  float sum = 0;
  for(int i=0; i<n; i++) sum += a;
  return sum;
}

float sumf_kahan(float a, int n)
{
  float sum = 0;
  float c = 0;
  for(int i=0; i<n; i++) {
    float y = a - c;
    float t = sum + y;
    c = (t -sum) - y;
    sum = t;
  }
  return sum;
}  

float mulf(float a, int n)
{
  return a*n;
}  

int main(void)
{
  int n = 1<<24;
  float a = 3.14159;
  float t1 = sumf(a,n);
  float t2 = sumf_kahan(a,n);
  float t3 = mulf(a,n);
  printf("%f %f %f\n",t1,t2,t3);
}

The result is 61848396.000000 52707136.000000 52707136.000000 which shows that multiplication and the Kahan summation have the same result which I think shows that the multiplication is more accurate than the simple sum.

521

asked Oct 15 '15 15:10

Z boson

1 Answers

There are some fundamental difference between

 float funct( int N, float sum )
 {
     float value = 10.0;
     for( i = 0; i < N ;i ++ ) {
         sum += value;
     }
     return sum;
 }

and

float funct( int N, float sum )
{
    float value = 10.0;
    sum += value * N;
    return sum;
}

When the sum approaches FLT_EPSILON * larger than value, the repeated sum tends towards a no-op. So any large value of N, would result in no change to sum for repeated addition. For the multiplication choice, the result (value * N) needs to be FLT_EPSILON * smaller than sum for the operation to have a no-op.

So the compiler can't make the optimization, because it can't tell if you wanted the exact behavior (where multiply is better), or the implemented behavior, where the scale of sum affects the result of the addition.

answered Oct 20 '22 05:10

mksteve

Related questions
                            
                                Why is the data type needed in pointer declarations?
                            
                                pow() seems to be out by one here
                            
                                A Simple, 2d cross-platform graphics library for c or c++? [closed]
                            
                                pthread_create and passing an integer as the last argument
                            
                                Is struct packing deterministic?
                            
                                Difference between unsigned and unsigned int in C
                            
                                xxxxxx.exe is not a valid Win32 application
                            
                                How to print binary number via printf [duplicate]
                            
                                How can I convert an integer to a hexadecimal string in C?
                            
                                Top down and Bottom up programming
                            
                                What is the difference between the functions of the exec family of system calls like exec and execve?
                            
                                Call a function before main [duplicate]
                            
                                Where to document functions in C or C++? [closed]
                            
                                Where does the k prefix for constants come from?
                            
                                Interesting project to learn C? [closed]
                            
                                Declaring type of pointers?
                            
                                Why use the Bitwise-Shift operator for values in a C enum definition?
                            
                                What does a const pointer-to-pointer mean in C and in C++?
                            
                                In binary notation, what is the meaning of the digits after the radix point "."?
                            
                                How do I change a TCP socket to be non-blocking?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

floating point multiplication vs repeated addition

Tags:

c

optimization

floating-point

gcc

Z boson

People also ask

1 Answers

mksteve

Recent Activity

Donate For Us