Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why doesn't GCC optimize a*a*a*a*a*a to (a*a*a)*(a*a*a)?

I am doing some numerical optimization on a scientific application. One thing I noticed is that GCC will optimize the call pow(a,2) by compiling it into a*a, but the call pow(a,6) is not optimized and will actually call the library function pow, which greatly slows down the performance. (In contrast, Intel C++ Compiler, executable icc, will eliminate the library call for pow(a,6).)

What I am curious about is that when I replaced pow(a,6) with a*a*a*a*a*a using GCC 4.5.1 and options "-O3 -lm -funroll-loops -msse4", it uses 5 mulsd instructions:

movapd  %xmm14, %xmm13 mulsd   %xmm14, %xmm13 mulsd   %xmm14, %xmm13 mulsd   %xmm14, %xmm13 mulsd   %xmm14, %xmm13 mulsd   %xmm14, %xmm13 

while if I write (a*a*a)*(a*a*a), it will produce

movapd  %xmm14, %xmm13 mulsd   %xmm14, %xmm13 mulsd   %xmm14, %xmm13 mulsd   %xmm13, %xmm13 

which reduces the number of multiply instructions to 3. icc has similar behavior.

Why do compilers not recognize this optimization trick?

like image 731
xis Avatar asked Jun 21 '11 18:06

xis


People also ask

Does gcc optimize code?

GCC performs nearly all supported optimizations that do not involve a space-speed tradeoff. As compared to -O , this option increases both compilation time and the performance of the generated code.

What is gcc optimize?

The compiler optimizes to reduce the size of the binary instead of execution speed. If you do not specify an optimization option, gcc attempts to reduce the compilation time and to make debugging always yield the result expected from reading the source code.

Does gcc optimize inline assembly?

No. GCC passes your assembly source through the preprocessor and then to the assembler. At no time are any optimisations performed.

Do compilers Optimise code?

Compilers are free to optimize code so long as they can guarantee the semantics of the code are not changed.


2 Answers

Because Floating Point Math is not Associative. The way you group the operands in floating point multiplication has an effect on the numerical accuracy of the answer.

As a result, most compilers are very conservative about reordering floating point calculations unless they can be sure that the answer will stay the same, or unless you tell them you don't care about numerical accuracy. For example: the -fassociative-math option of gcc which allows gcc to reassociate floating point operations, or even the -ffast-math option which allows even more aggressive tradeoffs of accuracy against speed.

like image 52
6 revs, 3 users 86% Avatar answered Sep 23 '22 05:09

6 revs, 3 users 86%


Lambdageek correctly points out that because associativity does not hold for floating-point numbers, the "optimization" of a*a*a*a*a*a to (a*a*a)*(a*a*a) may change the value. This is why it is disallowed by C99 (unless specifically allowed by the user, via compiler flag or pragma). Generally, the assumption is that the programmer wrote what she did for a reason, and the compiler should respect that. If you want (a*a*a)*(a*a*a), write that.

That can be a pain to write, though; why can't the compiler just do [what you consider to be] the right thing when you use pow(a,6)? Because it would be the wrong thing to do. On a platform with a good math library, pow(a,6) is significantly more accurate than either a*a*a*a*a*a or (a*a*a)*(a*a*a). Just to provide some data, I ran a small experiment on my Mac Pro, measuring the worst error in evaluating a^6 for all single-precision floating numbers between [1,2):

worst relative error using    powf(a, 6.f): 5.96e-08 worst relative error using (a*a*a)*(a*a*a): 2.94e-07 worst relative error using     a*a*a*a*a*a: 2.58e-07 

Using pow instead of a multiplication tree reduces the error bound by a factor of 4. Compilers should not (and generally do not) make "optimizations" that increase error unless licensed to do so by the user (e.g. via -ffast-math).

Note that GCC provides __builtin_powi(x,n) as an alternative to pow( ), which should generate an inline multiplication tree. Use that if you want to trade off accuracy for performance, but do not want to enable fast-math.

like image 30
Stephen Canon Avatar answered Sep 24 '22 05:09

Stephen Canon