Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

fp:precise vs. fp:strict performance

I detected some differences in my program results between Release and Debug versions. After some research I realized that some floating point optimizations are causing those differences. I have solved the problem by using the fenv_access pragma for disabling some optimizations for some critical methods.

Thinking about it, I realized that it is probably better to use the fp:strict model instead of fp:precise in my program because of its characteristics, but I am worried about performance. I have tried to find some information about the performance issues of fp:strict or the differences in performance between precise and strict, model but I have found very little information.

Does anyone know anything about this??

Thanks in advance.

like image 628
Alex Avatar asked Jun 21 '11 10:06

Alex


3 Answers

This happens because you are compiling in 32-bit mode, it uses the x86 floating point processor. The code optimizer removes redundant moves from the FPU registers to memory and back, leaving intermediary results in the FPU stack. A pretty important optimization.

Problem is, the FPU stores doubles with 80 bits of precision. Instead of the 64 bits of precision of a double. Intel originally assumed this was a feature, producing more accurate intermediary calculations but it is really a bug. They didn't make the same mistake when they designed the SSE2 instruction set, used by 64 bit compilers to do floating point math. The XMM registers are 64 bits.

So in the release mode build you get subtly different results since the calculations are performed with more bits. This should never be a problem in a program that uses floating point values to calculate, a double can only store 15 significant digits. What's different are the noise digits, the ones beyond the first 15 digits. But sometimes less if your calculation loses significant digits badly. Like calculating 1 - 3 * (1/3.0).

But yeah, you can use fp:precise to get consistent noise digits. It forces the intermediate values to be flushed to memory so they cannot remain in the FPU with 80 bits of precision. It makes your code slow of course.

like image 85
Hans Passant Avatar answered Oct 31 '22 17:10

Hans Passant


I am not sure if this is a solution but is what I have :) As I have post previously I have wrote a test program that performs floating point operations that is said to be optimized under fp:precise and not under fp:strict and then measure performance. I run it 10000 times and, in average, fp:strict is 2.85% slower than fp:precise.

like image 21
Alex Avatar answered Oct 31 '22 17:10

Alex


Just offering my two cents:

I have an image processing program that autovectorizes, the aim was to compare the performance and accuracy taking matlab as a gold standard.

Using VS2012 and an Intel i950.

Critical region error & runtime

2.3328196e-02 465 ms with strict 
7.1277611e-02 182 ms with precise
7.1277611e-02 188 ms with fast

strict did not vecotrization

Using strict slowed the code down by 2x. Which was not acceptable.

like image 45
Mikhail Avatar answered Oct 31 '22 16:10

Mikhail