Floating-point division - bias to avoid a result less than an 'exact' value

Question

I am currently tightening floating-point numerics for an estimate of a value. (It's: p(k,t) for those who are interested.) Essentially, the utility can never yield an under-estimate of this value: the security of probable prime generation depends on a numerically robust implementation. While output results agree with the published values, I have used the DBL_EPSILON value to ensure that division, in particular, yields a result that is never less than the true value:

Consider: double x, y; /* assigned some values... */

The evaluation: r = x / y; occurs frequently, but these (finite precision) results may truncate significant digits from the true result - a possibly infinite precision rational expansion. I currently try to mitigate this by applying a bias to the numerator, i.e.,

r = ((1.0 + DBL_EPSILON) * x) / y;

If you know anything about this subject, p(k,t) is typically much smaller than most estimates - but it's simply not good enough to dismiss the issue with this "observation". I can of course state:

(((1.0 + DBL_EPSILON) * x) / y) >= (x / y)

Of course, I need to ensure that the 'biased' result is greater than, or equal to, the 'exact' value. While I am certain it has to do with manipulating or scaling DBL_EPSILON, I obviously want the 'biased' result to exceed the 'exact' result by a minimum - demonstrable under IEEE-754 arithmetic assumptions.

Yes, I've looked though Goldberg's paper, and I've searched for a robust solution. Please don't suggest manipulation of rounding modes. Ideally, I'm after an answer by someone with a very good grasp on floating-point theorems, or knows of a very well illustrated example.

EDIT: To clarify, (((1.0 + DBL_EPSILON) * x) / y) or a form (((1.0 + c) * x) / y), is not a prerequisite. This was simply an approach I was using as 'probably good enough', without having provided a solid basis for it. I can state that the numerator and denominator will not be special values: NaNs, Infs, etc., nor will the denominator be zero.

Stephen Canon · Accepted Answer

First: I know that you don't want to set the rounding mode, but it really should be said that in terms of precision, as others have noted, setting the rounding mode will produce as good of an answer as possible. Specifically, assuming that x and y are both positive (which seems to be the case, but hasn't been explicitly stated in your question), the following is a standard C snippet with the desired effect[1]:

#include <math.h>
#pragma STDC FENV_ACCESS on

int OldRoundingMode = fegetround();
fesetround(FE_UPWARD);
r = x/y;
fesetround(OldRoundingMode);

Now, that aside, there are legitimate reasons not to want to change the rounding mode (some platforms don't support round-to-plus-infinity, on some platforms changing the rounding mode introduces a large serializing stall, etc etc), and your desire not to do so shouldn't be brushed aside so casually. So, respecting your question, what else can we do?

If your platform supports fused multiply-add, there's a very elegant solution available to you:

#include <math.h>
r = x/y;
if (fma(r,y,-x) < 0) r = nextafter(r, INFINITY);

On platforms with hardware fma support, this is very efficient. Even if fma( ) is implemented in software, it may be acceptable. This approach has the virtue that it will deliver the same result as would changing the rounding mode; that is, the tightest bound possible.

If your platform's C library is antediluvian and does not provide fma, there is still hope. Your claimed statement is correct (assuming no denormal values, at least -- I would need to think more about what happens for denormals); (1.0+DBL_EPSILON)*x/y really is always greater than or equal to the infinitely precise x/y. It will sometimes be one ulp larger than the smallest value with this property, but that's a very small and probably acceptable margin. The proof of these claims is pretty fussy, and probably not suitable for StackOverflow, but I'll give a quick sketch:

Ignoring denormals, it suffices to restrict ourselves to x, y in [1.0, 2.0).
(1.0 + eps)*x >= x + eps > x. To see this, observe:
```
(1.0 + eps)*x = x + x*eps >= x + eps > x.
```
Let P be the mathematically precise x/y. We have:
```
(1.0 + eps)*x/y >= (x + eps)/y = x/y + eps/y = P + eps/y
```
Now, y is bounded above by 2, so this gives us:
```
(1.0 + eps)*x/y > P + eps/2
```
which is sufficient to guarantee that the result rounds to a value >= P. This also shows us the way to a tighter bound. We could instead use nextafter(x,INFINITY)/y to get the desired effect with a tighter bound in many cases. (nextafter(x,INFINITY) is always x + ulp, whereas (1.0 + eps)*x will be x + 2ulp half of the time. If you want to avoid calling the nextafter library function, you can use (x + (0.75*DBL_EPSILON)*x) instead to get the same result, under the working assumption of positive normal values).

In order to be really pedantically correct, this would become significantly more complicated. No one really writes code like this, but it would be along these lines:

#include <math.h>
#pragma STDC FENV_ACCESS on

#if defined FE_UPWARD
    int OldRoundingMode = fegetround();
    if (OldRoundingMode < 0) goto Error;
    if (fesetround(FE_UPWARD)) goto Error;
    r = x/y;
    if (fesetround(OldRoundingMode)) goto TrulyHosed;
    return r;
TrulyHosed:
    // we established the desired rounding mode and did our computation,
    // but now we can't set it back to the original mode.  I have no idea
    // how you handle this gracefully.
Error:
#else
    // we can't establish the desired rounding mode, so fall back on
    // something else.

Floating-point division - bias to avoid a result less than an 'exact' value

Tags:

c

floating-point

floating-accuracy

ieee-754

Brett Hale

1 Answers

Stephen Canon

Recent Activity

Donate For Us

Floating-point division - bias to avoid a result less than an 'exact' value

Tags:

c

floating-point

floating-accuracy

ieee-754

Brett Hale

1 Answers

Stephen Canon

Related questions

Recent Activity

Donate For Us