My program spends 90% of CPU time in the std::pow(double,int)
function. Accuracy is not a primary concern here, so I was wondering if there were any faster alternatives. One thing I was thinking of trying is casting to float, performing the operation and then back to double (haven't tried this yet); I am concerned that this is not a portable way of improving performance (don't most CPUs operate on doubles intrinsically anyway?)
Cheers
With regard to C standard library sqrt and pow , the answer is no. First, if pow(x, . 5f) were faster than an implementation of sqrt(x) , the engineer assigned to maintain sqrt would replace the implementation with pow(x, . 5f) .
C++11 Performance Tip: Update on When to Use std::pow When not compiling with -ffast-math, direct multiplication was significantly faster than std::pow , around two orders of magnitude faster when comparing x * x * x and code:std::pow(x, 3) .
pow() is function to get the power of a number, but we have to use #include<math. h> in c/c++ to use that pow() function. then two numbers are passed. Example – pow(4 , 2); Then we will get the result as 4^2, which is 16.
It looks like Martin Ankerl has a few of articles on this, Optimized Approximative pow() in C / C++ is one and it has two fast versions, one is as follows:
inline double fastPow(double a, double b) { union { double d; int x[2]; } u = { a }; u.x[1] = (int)(b * (u.x[1] - 1072632447) + 1072632447); u.x[0] = 0; return u.d; }
which relies on type punning through a union which is undefined behavior in C++, from the draft standard section 9.5
[class.union]:
In a union, at most one of the non-static data members can be active at any time, that is, the value of at most one of the non-static data members can be stored in a union at any time. [...]
but most compilers including gcc support this with well defined behavior:
The practice of reading from a different union member than the one most recently written to (called “type-punning”) is common. Even with -fstrict-aliasing, type-punning is allowed, provided the memory is accessed through the union type
but this is not universal as this article points out and as I point out in my answer here using memcpy
should generate identical code and does not invoke undefined behavior.
He also links to a second one Optimized pow() approximation for Java, C / C++, and C#.
The first article also links to his microbenchmarks here
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With