Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Performance of std::pow - cache misses?

I've been trying to optimize a numeric program of mine, and have run into something of a mystery. I'm looping over code that performs thousands of floating point operations of which 1 call to pow - nevertheless, that call takes 5% of the time... That's not necessarily a critical issue, but it is odd, so I'd like to understand what's happening.

When I profiled for cache misses, VS.NET 2010RC's profiler reports that virtually all cache misses are occurring in std::pow... so... what's up with that? Is there a faster alternative? I tried powf, but that's only slightly faster; it's still responsible for an abnormal number of cache misses.

Why would a basic function like pow cause cache-misses?

Edit: this is not managed code. /Oi intrinsics are enabled, but the compiler may at its option ignore that. Replacing pow(x,y) by exp(y*log(x)) has similar performance - just now all the cache misses are in the log function.

like image 718
Eamon Nerbonne Avatar asked Mar 20 '10 19:03

Eamon Nerbonne


2 Answers

Yea.. it's slow. As to why in detail someone else who feels more confident can try to explain.

Want to speed it up ? here : http://martin.ankerl.com/2007/10/04/optimized-pow-approximation-for-java-and-c-c/

like image 58
Milan Avatar answered Sep 30 '22 07:09

Milan


Can you give more information on the 'x' as well as the environment where pow is evaluated?

What you are seeing might be the hardware prefetchers at work. Depending on the profiler the allocation of the 'cost' of the different assembly instructions might be incorrect, it should be even more frequent on long latency instructions like the ones needed to evaluate pow.

Added to that, I would use a real profiler like VTune/PTU than the one available in any Visual Studio version.

like image 39
Fabien Hure Avatar answered Sep 30 '22 07:09

Fabien Hure