Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pow(NAN) is very slow

What is the reason for the catastrophic performance of pow() for NaN values? As far as I can work out, NaNs should not have an impact on performance if the floating-point math is done with SSE instead of the x87 FPU.

This seems to be true for elementary operations, but not for pow(). I compared multiplication and division of a double to squaring and then taking the square root. If I compile the piece of code below with g++ -lrt, I get the following result:

multTime(3.14159): 20.1328ms
multTime(nan): 244.173ms
powTime(3.14159): 92.0235ms
powTime(nan): 1322.33ms

As expected, calculations involving NaN take considerably longer. Compiling with g++ -lrt -msse2 -mfpmath=sse however results in the following times:

multTime(3.14159): 22.0213ms
multTime(nan): 13.066ms
powTime(3.14159): 97.7823ms
powTime(nan): 1211.27ms

The multiplication / division of NaN is now much faster (actually faster than with a real number), but the squaring and taking the square root still takes a very long time.

Test code (compiled with gcc 4.1.2 on 32bit OpenSuSE 10.2 in VMWare, CPU is a Core i7-2620M)

#include <iostream>
#include <sys/time.h>
#include <cmath>

void multTime( double d )
{
   struct timespec startTime, endTime;
   double durationNanoseconds;

   clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &startTime);

   for(int i=0; i<1000000; i++)
   {
      d = 2*d;
      d = 0.5*d;
   }

   clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &endTime);
   durationNanoseconds = 1e9*(endTime.tv_sec - startTime.tv_sec) + (endTime.tv_nsec - startTime.tv_nsec);
   std::cout << "multTime(" << d << "): " << durationNanoseconds/1e6 << "ms" << std::endl;
}

void powTime( double d )
{
   struct timespec startTime, endTime;
   double durationNanoseconds;

   clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &startTime);

   for(int i=0; i<1000000; i++)
   {
      d = pow(d,2);
      d = pow(d,0.5);
   }

   clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &endTime);
   durationNanoseconds = 1e9*(endTime.tv_sec - startTime.tv_sec) + (endTime.tv_nsec - startTime.tv_nsec);
   std::cout << "powTime(" << d << "): " << durationNanoseconds/1e6 << "ms" << std::endl;
}

int main()
{
   multTime(3.14159);
   multTime(NAN);

   powTime(3.14159);
   powTime(NAN);
}

Edit:

Unfortunately, my knowledge on this topic is extremely limited, but I guess that the glibc pow() never uses SSE on a 32bit system, but rather some assembly in sysdeps/i386/fpu/e_pow.S. There is a function __ieee754_pow_sse2 in more recent glibc versions, but it's in sysdeps/x86_64/fpu/multiarch/e_pow.c and therefore probably only works on x64. However, all of this might be irrelevant here, because pow() is also a gcc built-in function. For an easy fix, see Z boson's answer.

like image 266
dasdingonesin Avatar asked Jul 24 '14 08:07

dasdingonesin


2 Answers

"NaNs should not have an impact on performance if the floating-point math is done with SSE instead of the x87 FPU."

I'm not sure this follows from the resource you quote. In any case, pow is a C library function. It is not implemented as an instruction, even on x87. So there are 2 separate issues here - how SSE handles NaN values, and how a pow function implementation handles NaN values.

If the pow function implementation uses a different path for special values like +/-Inf, or NaN, you might expect a NaN value for the base, or exponent, to return a value quickly. On the other hand, the implementation might not handle this as a separate case, and simply relies on floating-point operations to propagate intermediate results as NaN values.

Starting with 'Sandy Bridge', many of the performance penalties associated with denormals were reduced or eliminated. Not all though, as the author describes a penalty for mulps. Therefore, it would be reasonable to expect that not all arithmetic operations involving NaNs are 'fast'. Some architectures might even revert to microcode to handle NaNs in different contexts.

like image 124
Brett Hale Avatar answered Sep 29 '22 18:09

Brett Hale


Your math library is too old. Either find another math library which implements pow with NAN better or implement a fix like this:

inline double pow_fix(double x, double y) 
{
    if(x!=x) return x;
    if(y!=y) return y;
    return pow(x,y);
}

Compile with g++ -O3 -msse2 -mfpmath=sse foo.cpp.

like image 45
Z boson Avatar answered Sep 29 '22 19:09

Z boson