I am currently trying to optimize some code where 50% of the time is spent in <code>std::pow()</code>. I know that the exponent will always be a positive integer, and the base will always be a double in the interval (0, 1). For fun, I wrote a function: <pre class="prettyprint"><code>inline double int_pow(double base, int exponent) { double out = 1.0; for(int i = 0; i < exponent; i++) { out *= base; } return out; } </code></pre> I am compiling with: <pre class="prettyprint"><code>> g++ fast-pow.cpp -O3 --std=c++11 </code></pre> I generated 100 million doubles between (0, 1) and compared the timings of (1) <code>std::pow</code> (2) my homemade <code>int_pow</code> function from above and (3) direct multiplication. Here's a sketch of my timing routine (this is a very quickly put-together test): <pre class="prettyprint"><code>void time_me(int exp, size_t reps) { volatile double foo = 0.0; double base = 0.0; size_t i; for (i = 0; i < reps; ++i) { base = ((double) rand() / (RAND_MAX)) + 1; foo = pow(base, exp); // foo = int_pow(base, exp); // foo = base * base * base; } // check that the loop made it to the end std::cout << foo << " " << i << std::endl; } int main() { std::clock_t start; start = std::clock(); time_me(3, 1e8); std::cout << "Time: " << (std::clock() - start) / (double)(CLOCKS_PER_SEC / 1000) << std::endl; return 0; } </code></pre> Here are the timings I've observed for various exponents: <ul> <li> 0: <code>std::pow</code> 0.71s, <code>int_pow</code> 0.77s</li> <li> 2: <code>std::pow</code> 1.31s, <code>int_pow</code> 0.80s, direct mult 0.86s</li> <li> 3: <code>std::pow</code> 6.9s (!!), <code>int_pow</code> 0.84s, direct mult 0.76s</li> <li> 5: Similar to 3: </li> </ul> <h3>My Questions</h3> So with this, my questions are: <ol> <li>Why does the performance of <code>std::pow</code> appear to degrade so badly for powers greater than 2? </li> <li>Is there an existing faster power function when the base or exponent types are known ahead of time? </li> <li>Is there something completely obvious I'm overlooking? I'm about to go through gut <code>std::pow</code> for the cases with known integer exponents, and would hate to have missed something completely trivial.</li> </ol> Thanks!!

<code>std::pow()</code> is a general purpose function designed to accept any pair of floating point values. It performs expensive computations and should be considered a slow function. However, apparently, a lot of peopled have abused it for squaring, so implementation of <code>pow()</code> in IBM Accurate Mathematical Library (which is used by glibc) was optimized for that particular case: sysdeps/ieee754/dbl-64/e_pow.c: <pre class="prettyprint"><code>double __ieee754_pow (double x, double y) { ... ... if (y == 1.0) return x; if (y == 2.0) return x * x; if (y == -1.0) return 1.0 / x; if (y == 0) return 1.0; </code></pre> As you can see, exponent values 0, 1 and -1 are also handled specially, but those, at least, are mathematically significant special cases, whereas squaring is merely a statistically significant case, that shouldn't otherwise deserve special handling). EDIT: Exponent values <code>0</code>, <code>1</code>, <code>2</code>, and <code>-1</code> are the only ones that allow expressing <code>std::pow(x,n)</code> with (much faster) arithmetic operations without any loss of accuracy. See this answer for more details. Thus exponent value of <code>2</code> is not just a statistically significant case. END EDIT If you want a fast alternative to <code>std::pow()</code> for non-negative integer values of the exponent and don't care about the slight accuracy loss, then <ol> <li>for sufficiently small values of the exponent use your implementation of int_pow();</li> <li>otherwise, use exponentiation by squaring approach.</li> </ol> The boundary value of the exponent for selecting between the 1st and 2nd methods must be found via careful benchmarking.

std::pow very different behavior for different exponents

Tags:

c++

performance

numerical-methods

I am currently trying to optimize some code where 50% of the time is spent in std::pow(). I know that the exponent will always be a positive integer, and the base will always be a double in the interval (0, 1). For fun, I wrote a function:

inline double int_pow(double base, int exponent)
{
    double out = 1.0;
    for(int i = 0; i < exponent; i++)
    {
        out *= base;
    }

    return out;
}

I am compiling with:

> g++ fast-pow.cpp -O3 --std=c++11

I generated 100 million doubles between (0, 1) and compared the timings of (1) std::pow (2) my homemade int_pow function from above and (3) direct multiplication. Here's a sketch of my timing routine (this is a very quickly put-together test):

void time_me(int exp, size_t reps)
{
    volatile double foo = 0.0;
    double base = 0.0;

    size_t i;
    for (i = 0; i < reps; ++i)
    {
        base = ((double) rand() / (RAND_MAX)) + 1;
        foo = pow(base, exp);
        // foo = int_pow(base, exp);
        // foo = base * base * base;
    }

    // check that the loop made it to the end
    std::cout << foo << "  " << i <<  std::endl;
}

int main()
{
    std::clock_t start;

    start = std::clock();
    time_me(3, 1e8);
    std::cout << "Time: " << (std::clock() - start) / (double)(CLOCKS_PER_SEC / 1000) << std::endl;

    return 0;
}

Here are the timings I've observed for various exponents:

0: std::pow 0.71s, int_pow 0.77s
2: std::pow 1.31s, int_pow 0.80s, direct mult 0.86s
3: std::pow 6.9s (!!), int_pow 0.84s, direct mult 0.76s
5: Similar to 3:

My Questions

So with this, my questions are:

Why does the performance of std::pow appear to degrade so badly for powers greater than 2?
Is there an existing faster power function when the base or exponent types are known ahead of time?
Is there something completely obvious I'm overlooking? I'm about to go through gut std::pow for the cases with known integer exponents, and would hate to have missed something completely trivial.

Thanks!!

791

asked Jun 27 '16 17:06

MAB

1 Answers

std::pow() is a general purpose function designed to accept any pair of floating point values. It performs expensive computations and should be considered a slow function. However, apparently, a lot of peopled have abused it for squaring, so implementation of pow() in IBM Accurate Mathematical Library (which is used by glibc) was optimized for that particular case:

sysdeps/ieee754/dbl-64/e_pow.c:

double
__ieee754_pow (double x, double y)
{
  ...
  ...
  if (y == 1.0)
    return x;
  if (y == 2.0)
    return x * x;
  if (y == -1.0)
    return 1.0 / x;
  if (y == 0)
    return 1.0;

As you can see, exponent values 0, 1 and -1 are also handled specially, but those, at least, are mathematically significant special cases, whereas squaring is merely a statistically significant case, that shouldn't otherwise deserve special handling). EDIT: Exponent values 0, 1, 2, and -1 are the only ones that allow expressing std::pow(x,n) with (much faster) arithmetic operations without any loss of accuracy. See this answer for more details. Thus exponent value of 2 is not just a statistically significant case. END EDIT

If you want a fast alternative to std::pow() for non-negative integer values of the exponent and don't care about the slight accuracy loss, then

for sufficiently small values of the exponent use your implementation of int_pow();
otherwise, use exponentiation by squaring approach.

The boundary value of the exponent for selecting between the 1st and 2nd methods must be found via careful benchmarking.

137

answered Sep 27 '22 02:09

Leon

Related questions
                            
                                How do I dump gcc warnings into a structured format?
                            
                                How to know if computer is in gaming mode
                            
                                Does a throw in catch(...) throw by value or by reference
                            
                                How to do payload compression in grpc?
                            
                                Bit shift leads to strange type conversion
                            
                                OSX: How to statically link a library and dynamically link the standard library?
                            
                                Binary search in std::vector
                            
                                If a variable is allocated on the stack in an inline function using alloca, is its reference valid after the inline function returns?
                            
                                C++ Determine the type of a polymorphic object at runtime
                            
                                Complexity of std::count
                            
                                Create a std::function type with limited arguments
                            
                                How to build cmake ExternalProject while configurating main one?
                            
                                How to implement convenient initialization?
                            
                                How to safely cast integral types to scoped enums
                            
                                Difference between .cma, .cmo, .cmx files and how to use them correctly in compilation?
                            
                                Unity - does the current version generate native code or not?
                            
                                what is the Macro "QT_BEGIN_NAMESPACE" mean in Qt 5? [duplicate]
                            
                                Clang AST visitor, avoid traversing include files
                            
                                Pass enum with enum class by reference
                            
                                Templated function not called

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With