Why doesn't GCC optimize aaaaaa to (aaa)(aaa)?

Tags:

I am doing some numerical optimization on a scientific application. One thing I noticed is that GCC will optimize the call pow(a,2) by compiling it into a*a, but the call pow(a,6) is not optimized and will actually call the library function pow, which greatly slows down the performance. (In contrast, Intel C++ Compiler, executable icc, will eliminate the library call for pow(a,6).)

What I am curious about is that when I replaced pow(a,6) with a*a*a*a*a*a using GCC 4.5.1 and options "-O3 -lm -funroll-loops -msse4", it uses 5 mulsd instructions:

movapd  %xmm14, %xmm13 mulsd   %xmm14, %xmm13 mulsd   %xmm14, %xmm13 mulsd   %xmm14, %xmm13 mulsd   %xmm14, %xmm13 mulsd   %xmm14, %xmm13

while if I write (a*a*a)*(a*a*a), it will produce

movapd  %xmm14, %xmm13 mulsd   %xmm14, %xmm13 mulsd   %xmm14, %xmm13 mulsd   %xmm13, %xmm13

which reduces the number of multiply instructions to 3. icc has similar behavior.

Why do compilers not recognize this optimization trick?

731

asked Jun 21 '11 18:06

xis

2 Answers

Because Floating Point Math is not Associative. The way you group the operands in floating point multiplication has an effect on the numerical accuracy of the answer.

As a result, most compilers are very conservative about reordering floating point calculations unless they can be sure that the answer will stay the same, or unless you tell them you don't care about numerical accuracy. For example: the -fassociative-math option of gcc which allows gcc to reassociate floating point operations, or even the -ffast-math option which allows even more aggressive tradeoffs of accuracy against speed.

answered Sep 23 '22 05:09

6 revs, 3 users 86%

Lambdageek correctly points out that because associativity does not hold for floating-point numbers, the "optimization" of a*a*a*a*a*a to (a*a*a)*(a*a*a) may change the value. This is why it is disallowed by C99 (unless specifically allowed by the user, via compiler flag or pragma). Generally, the assumption is that the programmer wrote what she did for a reason, and the compiler should respect that. If you want (a*a*a)*(a*a*a), write that.

That can be a pain to write, though; why can't the compiler just do [what you consider to be] the right thing when you use pow(a,6)? Because it would be the wrong thing to do. On a platform with a good math library, pow(a,6) is significantly more accurate than either a*a*a*a*a*a or (a*a*a)*(a*a*a). Just to provide some data, I ran a small experiment on my Mac Pro, measuring the worst error in evaluating a^6 for all single-precision floating numbers between [1,2):

worst relative error using    powf(a, 6.f): 5.96e-08 worst relative error using (a*a*a)*(a*a*a): 2.94e-07 worst relative error using     a*a*a*a*a*a: 2.58e-07

Using pow instead of a multiplication tree reduces the error bound by a factor of 4. Compilers should not (and generally do not) make "optimizations" that increase error unless licensed to do so by the user (e.g. via -ffast-math).

Note that GCC provides __builtin_powi(x,n) as an alternative to pow( ), which should generate an inline multiplication tree. Use that if you want to trade off accuracy for performance, but do not want to enable fast-math.

answered Sep 24 '22 05:09

Stephen Canon

Related questions
                            
                                How do the likely/unlikely macros in the Linux kernel work and what is their benefit?
                            
                                gcc makefile error: "No rule to make target ..."
                            
                                How to get rid of `deprecated conversion from string constant to ‘char*’` warnings in GCC?
                            
                                setup script exited with error: command 'x86_64-linux-gnu-gcc' failed with exit status 1
                            
                                Undefined reference to vtable
                            
                                How do you get assembler output from C/C++ source in gcc?
                            
                                Why does GCC generate 15-20% faster code if I optimize for size instead of speed?
                            
                                "Agreeing to the Xcode/iOS license requires admin privileges, please re-run as root via sudo." when using GCC
                            
                                GCC -fPIC option
                            
                                Debug vs Release in CMake
                            
                                Convert char to int in C and C++
                            
                                What exactly is LLVM?
                            
                                Why does the order in which libraries are linked sometimes cause errors in GCC?
                            
                                How do I list the symbols in a .so file
                            
                                How to automatically generate a stacktrace when my program crashes
                            
                                Why is my program slow when looping over exactly 8192 elements?
                            
                                What is the difference between g++ and gcc?
                            
                                Why does the C preprocessor interpret the word "linux" as the constant "1"?
                            
                                fatal error: Python.h: No such file or directory
                            
                                Compiling an application for use in highly radioactive environments

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why doesn't GCC optimize aaaaaa to (aaa)(aaa)?

Tags:

floating-point

compiler-optimization

gcc

assembly

fast-math

xis

People also ask

2 Answers

6 revs, 3 users 86%

Stephen Canon

Recent Activity

Donate For Us

Why doesn't GCC optimize a*a*a*a*a*a to (a*a*a)*(a*a*a)?

Tags:

floating-point

compiler-optimization

gcc

assembly

fast-math

xis

People also ask

2 Answers

6 revs, 3 users 86%

Stephen Canon

Related questions

Recent Activity

Donate For Us

Why doesn't GCC optimize aaaaaa to (aaa)(aaa)?