Probability that a formula fails in IEEE 754

Question

On my computer, I can check that

(0.1 + 0.2) + 0.3 == 0.1 + (0.2 + 0.3)

evaluates to False.

More generally, I can estimate that the formula (a + b) + c == a + (b + c) fails roughly 17% of the time when a,b,c are chosen uniformly and independently on [0,1], using the following simulation:

import numpy as np
import numexpr

np.random.seed(0)
formula = '(a + b) + c == a + (b + c)'


def failure_expectation(formula=formula, N=10**6):
    a, b, c = np.random.rand(3, N)
    return 1.0 - numexpr.evaluate(formula).mean()
# e.g. 0.171744

I wonder if it is possible to arrive at this probability by hand, e.g. using the definitions in the floating point standard and some assumption on the uniform distribution.

Given the answer below, I assume that the following part of the original question is out of reach, at least for now.

Is there is a tool that computes the failure probability for a given formula without running a simulation.

Formulas can be assumed to be simple, e.g. involving the use of parentheses, addition, subtraction, and possibly multiplication and division.

(What follows may be an artifact of numpy random number generation, but still seems fun to explore.)

Bonus question based on an observation by NPE. We can use the following code to generate failure probabilities for uniform distributions on a sequence of ranges [[-n,n] for n in range(100)]:

import pandas as pd

def failures_in_symmetric_interval(n):
    a, b, c = (np.random.rand(3, 10**4) - 0.5) * n
    return 1.0 - numexpr.evaluate(formula).mean()

s = pd.Series({
    n: failures_in_symmetric_interval(n)
    for n in range(100)
})

The plot looks something like this:

enter image description here

In particular, failure probability dips down to 0 when n is a power of 2 and seems to have a fractal pattern. It also looks like every "dip" has a failure probability equal to that of some previous "peak". Any elucidation of why this happens would be great!

Mark Dickinson · Accepted Answer

It's definitely possible to evaluate these things by hand, but the only methods I know are tedious and involve a lot of case-by-case enumeration.

For example, for your specific example of determining the probability that (a + b) + c == a + (b + c), that probability is 53/64, to within a few multiples of the machine epsilon. So the probability of a mismatch is 11/64, or around 17.19%, which agrees with what you were observing from your simulation.

To start with, note that there's a major simplifying factor in this particular case, and that's that Python and NumPy's "uniform-on-[0, 1]" random numbers are always of the form n/2**53 for some integer n in range(2**53), and within the constraints of the underlying Mersenne Twister PRNG, each such number is equally likely to occur. Since there are around 2**62 IEEE 754 binary64 representable values in the range [0.0, 1.0], that means that the vast majority of those IEEE 754 values aren't generated by random.random() (or np.random.rand()). This fact greatly simplifies the analysis, but also means that it's a bit of a cheat.

Here's an incomplete sketch, just to give an idea of what's involved. To compute the value of 53/64, I had to divide into five separate cases:

The case where both a + b < 1 and b + c < 1. In this case, both a + b and b + c are computed without error, and (a + b) + c and a + (b + c) therefore both give the closest float to the exact result, rounding ties to even as usual. So in this case, the probability of agreement is 1.
The case where a + b < 1 and b + c >= 1. Here (a + b) + c will be the correctly rounded value of the true sum, but a + (b + c) may not be. We can divide further into subcases, depending on the parity of the least significant bits of a, b and c. Let's abuse terminology and call a "odd" if it's of the form n/2**53 with n odd, and "even" if it's of the form n/2**53 with n even, and similarly for b and c. If b and c have the same parity (which will happen half the time), then (b + c) is computed exactly and again a + (b + c) must match (a + b) + c. For the other cases, the probability of agreement is 1/2 in each case; the details are all very similar, but for example in the case where a is odd, b is odd and c is even, (a + b) + c is computed exactly, while in computing a + (b + c) we incur two rounding errors, each of magnitude exactly 2**-53. If those two errors are in opposite directions, they cancel and we get agreement. If not, we don't. Overall, there's a 3/4 probability of agreement in this case.
The case where a + b >= 1 and b + c < 1. This is identical to the previous case after swapping the roles of a and c; the probability of agreement is again 3/4.
a + b >= 1 and b + c >= 1, but a + b + c < 2. Again, one can split on the parities of a, b and c and look at each of the resulting 8 cases in turn. For the cases even-even-even and odd-odd-odd we always get agreement. For the case odd-even-odd, the probability of agreement turns out to be 3/4 (by yet further subanalysis). For all the other cases, it's 1/2. Putting those together gets an aggregate probability of 21/32 for this case.
Case a + b + c >= 2. In this case, since we're rounding the final result to a multiple of four times 2**-53, it's necessary to look not just at the parities of a, b, and c, but to look at the last two significant bits. I'll spare you the gory details, but the probability of agreement turns out to be 13/16.

Finally, we can put all these cases together. To do that, we also need to know the probability that our triple (a, b, c) lands in each case. The probability that a + b < 1 and b + c < 1 is the volume of the square-based pyramid described by 0 <= a, b, c <= 1, a + b < 1, b + c < 1, which is 1/3. The probabilities of the other four cases can be seen (either by a bit of solid geometry, or by setting up suitable integrals) to be 1/6 each.

So our grand total is 1/3 * 1 + 1/6 * 3/4 + 1/6 * 3/4 + 1/6 * 21/32 + 1/6 * 13/16, which comes out to be 53/64, as claimed.

A final note: 53/64 almost certainly isn't quite the right answer - to get a perfectly accurate answer we'd need to be careful about all the corner cases where a + b, b + c, or a + b + c hit a binade boundary (1.0 or 2.0). It would certainly be possible to do refine the above approach to compute exactly how many of the 2**109 possible triples (a, b, c) satisfy (a + b) + c == a + (b + c), but not before it's time for me to go to bed. But the corner cases should constitute on the order of 1/2**53 of the total number of cases, so our estimate of 53/64 should be accurate to at least 15 decimal places.

Of course, there are lots of details missing above, but I hope it gives some idea of how it might be possible to do this.

Probability that a formula fails in IEEE 754

Tags:

python

floating-point

numpy

probability

ieee-754

hilberts_drinking_problem

1 Answers

Mark Dickinson

Recent Activity

Donate For Us

Probability that a formula fails in IEEE 754

Tags:

python

floating-point

numpy

probability

ieee-754

hilberts_drinking_problem

1 Answers

Mark Dickinson

Related questions

Recent Activity

Donate For Us