Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to deal with excess precision in floating-point computations?

In my numerical simulation I have code similar to the following snippet

double x;
do {
  x = /* some computation */;
} while (x <= 0.0);
/* some algorithm that requires x to be (precisely) larger than 0 */

With certain compilers (e.g. gcc) on certain platforms (e.g. linux, x87 math) it is possible that x is computed in higher than double precision ("with excess precision"). (Update: When I talk of precision here, I mean precision /and/ range.) Under these circumstances it is conceivable that the comparison (x <= 0) returns false even though the next time x is rounded down to double precision it becomes 0. (And there's no guarantee that x isn't rounded down at an arbitrary point in time.)

Is there any way to perform this comparison that

  • is portable,
  • works in code that gets inlined,
  • has no performance impact and
  • doesn't exclude some arbitrary range (0, eps)?

I tried to use (x < std::numeric_limits<double>::denorm_min()) but that seemed to significantly slow down the loop when working with SSE2 math. (I know that denormals can slow down a computation, but I didn't expect them to be slower to just move around and compare.)

Update: An alternative is to use volatile to force x into memory before the comparison, e.g. by writing

} while (*((volatile double*)&x) <= 0.0);

However, depending on the application and the optimizations applied by the compiler, this solution can introduce a noticeable overhead too.

Update: The problem with any tolerance is that it's quite arbitrary, i.e. it depends on the specific application or context. I'd prefer to just do the comparison without excess precision, so that I don't have to make any additional assumptions or introduce some arbitrary epsilons into the documentation of my library functions.

like image 241
Stephan Avatar asked Feb 02 '09 14:02

Stephan


People also ask

How do you increase the precision of a floating-point?

Store the value in a higher-precision variable. E.g., instead of float step , use double step . In this case the value you've calculated won't be rounded once more, so precision will be higher.

What happens when floating-point overflow?

When a program attempts to do that a floating point overflow occurs. In general, a floating point overflow occurs whenever the value being assigned to a variable is larger than the maximum possible value for that variable. Floating point overflows in MODFLOW can be a symptom of a problem with the model.


2 Answers

In your question, you stated that using volatile will work but that there'll be a huge performance hit. What about using the volatile variable only during the comparison, allowing x to be held in a register?

double x; /* might have excess precision */
volatile double x_dbl; /* guaranteed to be double precision */
do {
  x = /* some computation */;
  x_dbl = x;
} while (x_dbl <= 0.0);

You should also check if you can speed up the comparison with the smallest subnormal value by using long double explicitly and cache this value, ie

const long double dbl_denorm_min = static_cast<long double>(std::numeric_limits<double>::denorm_min());

and then compare

x < dbl_denorm_min

I'd assume that a decent compiler would do this automatically, but one never knows...

like image 169
Christoph Avatar answered Nov 15 '22 17:11

Christoph


I wonder whether you have the right stopping criterion. It sounds like x <= 0 is an exception condition, but not a terminating condition and that the terminating condition is easier to satisfy. Maybe there should be a break statement inside your while loop that stops the iteration when some tolerance is met. For example, a lot of algorithm terminate when two successive iterations are sufficiently close to each other.

like image 45
John D. Cook Avatar answered Nov 15 '22 18:11

John D. Cook