In C code it is common to write
a = b*b;
instead of
a = pow(b, 2.0);
for double
variables. I get that since pow
is a generic function capable of handling non-integer exponents, one should naïvely think that the first version is faster. I wonder however whether the compiler (gcc) transforms calls to pow
with integer exponents to direct multiplication as part of any of the optional optimizations.
Assuming that this optimization does not take place, what is the largest integer exponent for which it is faster to write out the multiplication manually, as in b*b* ... *b
?
I know that I could make performance tests on a given machine to figure out whether I should even care, but I would like to gain some deeper understanding on what is "the right thing" to do.
What you want is -ffinite-math-only -ffast-math
and possibly #include <tgmath.h> This is the same as -Ofast
without mandating the -O3
optimizations.
Not only does it help these kinds of optimizations when -ffinite-math-only and -ffast-math
is enabled, the type generic math also helps compensate for when you forget to append the proper suffix to a (non-double) math function.
For example:
#include <tgmath.h>
float pow4(float f){return pow(f,4.0f);}
//compiles to
pow4:
vmulss xmm0, xmm0, xmm0
vmulss xmm0, xmm0, xmm0
ret
For clang this works for powers up to 32, while gcc does this for powers up to at least 2,147,483,647 (that's as far as I checked) unless -Os
is enabled (because a jmp
to the pow function is technically smaller) - with -Os, it will only do a power of 2.
WARNING -ffast-math
is just a convenience alias to several other optimizations, many of which break all kinds of standards. If you'd rather use only the minimal flags to get this desired behavior, then you can use -fno-math-errno -funsafe-math-optimizations -ffinite-math-only
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With