Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When using doubles, why isn't (x / (y * z)) the same as (x / y / z)? [duplicate]

Tags:

This is partly academic, as for my purposes I only need it rounded to two decimal places; but I am keen to know what is going on to produce two slightly different results.

This is the test that I wrote to narrow it to the simplest implementation:

@Test public void shouldEqual() {   double expected = 450.00d / (7d * 60);  // 1.0714285714285714   double actual = 450.00d / 7d / 60;      // 1.0714285714285716    assertThat(actual).isEqualTo(expected); } 

But it fails with this output:

org.junit.ComparisonFailure:  Expected :1.0714285714285714 Actual   :1.0714285714285716 

Can anyone explain in detail what is going on under the hood to result in the value at 1.000000000000000X being different?

Some of the points I'm looking for in an answer are: Where is the precision lost? Which method is preferred, and why? Which is actually correct? (In pure maths, both can't be right. Perhaps both are wrong?) Is there a better solution or method for these arithmetic operations?

like image 594
Ben Pearson Avatar asked Apr 24 '15 08:04

Ben Pearson


People also ask

Why is double not accurate?

doubles are not exact. It is because there are infinite possible real numbers and only finite number of bits to represent these numbers.

Why float and double are not precise?

The binary representation of the decimal number may not be exact. There is a type mismatch between the numbers used (for example, mixing float and double).


2 Answers

I see a bunch of questions that tell you how to work around this problem, but not one that really explains what's going on, other than "floating-point roundoff error is bad, m'kay?" So let me take a shot at it. Let me first point out that nothing in this answer is specific to Java. Roundoff error is a problem inherent to any fixed-precision representation of numbers, so you get the same issues in, say, C.

Roundoff error in a decimal data type

As a simplified example, imagine we have some sort of computer that natively uses an unsigned decimal data type, let's call it float6d. The length of the data type is 6 digits: 4 dedicated to the mantissa, and 2 dedicated to the exponent. For example, the number 3.142 can be expressed as

3.142 x 10^0 

which would be stored in 6 digits as

503142 

The first two digits are the exponent plus 50, and the last four are the mantissa. This data type can represent any number from 0.001 x 10^-50 to 9.999 x 10^+49.

Actually, that's not true. It can't store any number. What if you want to represent 3.141592? Or 3.1412034? Or 3.141488906? Tough luck, the data type can't store more than four digits of precision, so the compiler has to round anything with more digits to fit into the constraints of the data type. If you write

float6d x = 3.141592; float6d y = 3.1412034; float6d z = 3.141488906; 

then the compiler converts each of these three values to the same internal representation, 3.142 x 10^0 (which, remember, is stored as 503142), so that x == y == z will hold true.

The point is that there is a whole range of real numbers which all map to the same underlying sequence of digits (or bits, in a real computer). Specifically, any x satisfying 3.1415 <= x <= 3.1425 (assuming half-even rounding) gets converted to the representation 503142 for storage in memory.

This rounding happens every time your program stores a floating-point value in memory. The first time it happens is when you write a constant in your source code, as I did above with x, y, and z. It happens again whenever you do an arithmetic operation that increases the number of digits of precision beyond what the data type can represent. Either of these effects is called roundoff error. There are a few different ways this can happen:

  • Addition and subtraction: if one of the values you're adding has a different exponent from the other, you will wind up with extra digits of precision, and if there are enough of them, the least significant ones will need to be dropped. For example, 2.718 and 121.0 are both values that can be exactly represented in the float6d data type. But if you try to add them together:

       1.210     x 10^2 +  0.02718   x 10^2 -------------------    1.23718   x 10^2 

    which gets rounded off to 1.237 x 10^2, or 123.7, dropping two digits of precision.

  • Multiplication: the number of digits in the result is approximately the sum of the number of digits in the two operands. This will produce some amount of roundoff error, if your operands already have many significant digits. For example, 121 x 2.718 gives you

       1.210     x 10^2 x  0.02718   x 10^2 -------------------    3.28878   x 10^2 

    which gets rounded off to 3.289 x 10^2, or 328.9, again dropping two digits of precision.

    However, it's useful to keep in mind that, if your operands are "nice" numbers, without many significant digits, the floating-point format can probably represent the result exactly, so you don't have to deal with roundoff error. For example, 2.3 x 140 gives

       1.40      x 10^2 x  0.23      x 10^2 -------------------    3.22      x 10^2 

    which has no roundoff problems.

  • Division: this is where things get messy. Division will pretty much always result in some amount of roundoff error unless the number you're dividing by happens to be a power of the base (in which case the division is just a digit shift, or bit shift in binary). As an example, take two very simple numbers, 3 and 7, divide them, and you get

       3.                x 10^0 /  7.                x 10^0 ----------------------------    0.428571428571... x 10^0 

    The closest value to this number which can be represented as a float6d is 4.286 x 10^-1, or 0.4286, which distinctly differs from the exact result.

As we'll see in the next section, the error introduced by rounding grows with each operation you do. So if you're working with "nice" numbers, as in your example, it's generally best to do the division operations as late as possible because those are the operations most likely to introduce roundoff error into your program where none existed before.

Analysis of roundoff error

In general, if you can't assume your numbers are "nice", roundoff error can be either positive or negative, and it's very difficult to predict which direction it will go just based on the operation. It depends on the specific values involved. Look at this plot of the roundoff error for 2.718 z as a function of z (still using the float6d data type):

roundoff error for multiplication by 2.718

In practice, when you're working with values that use the full precision of your data type, it's often easier to treat roundoff error as a random error. Looking at the plot, you might be able to guess that the magnitude of the error depends on the order of magnitude of the result of the operation. In this particular case, when z is of the order 10-1, 2.718 z is also on the order of 10-1, so it will be a number of the form 0.XXXX. The maximum roundoff error is then half of the last digit of precision; in this case, by "the last digit of precision" I mean 0.0001, so the roundoff error varies between -0.00005 and +0.00005. At the point where 2.718 z jumps up to the next order of magnitude, which is 1/2.718 = 0.3679, you can see that the roundoff error also jumps up by an order of magnitude.

You can use well-known techniques of error analysis to analyze how a random (or unpredictable) error of a certain magnitude affects your result. Specifically, for multiplication or division, the "average" relative error in your result can be approximated by adding the relative error in each of the operands in quadrature - that is, square them, add them, and take the square root. With our float6d data type, the relative error varies between 0.0005 (for a value like 0.101) and 0.00005 (for a value like 0.995).

relative error in values between 0.1 and 1

Let's take 0.0001 as a rough average for the relative error in values x and y. The relative error in x * y or x / y is then given by

sqrt(0.0001^2 + 0.0001^2) = 0.0001414 

which is a factor of sqrt(2) larger than the relative error in each of the individual values.

When it comes to combining operations, you can apply this formula multiple times, once for each floating-point operation. So for instance, for z / (x * y), the relative error in x * y is, on average, 0.0001414 (in this decimal example) and then the relative error in z / (x * y) is

sqrt(0.0001^2 + 0.0001414^2) = 0.0001732 

Notice that the average relative error grows with each operation, specifically as the square root of the number of multiplications and divisions you do.

Similarly, for z / x * y, the average relative error in z / x is 0.0001414, and the relative error in z / x * y is

sqrt(0.0001414^2 + 0.0001^2) = 0.0001732 

So, the same, in this case. This means that for arbitrary values, on average, the two expressions introduce approximately the same error. (In theory, that is. I've seen these operations behave very differently in practice, but that's another story.)

Gory details

You might be curious about the specific calculation you presented in the question, not just an average. For that analysis, let's switch to the real world of binary arithmetic. Floating-point numbers in most systems and languages are represented using IEEE standard 754. For 64-bit numbers, the format specifies 52 bits dedicated to the mantissa, 11 to the exponent, and one to the sign. In other words, when written in base 2, a floating point number is a value of the form

1.1100000000000000000000000000000000000000000000000000 x 2^00000000010                        52 bits                             11 bits 

The leading 1 is not explicitly stored, and constitutes a 53rd bit. Also, you should note that the 11 bits stored to represent the exponent are actually the real exponent plus 1023. For example, this particular value is 7, which is 1.75 x 22. The mantissa is 1.75 in binary, or 1.11, and the exponent is 1023 + 2 = 1025 in binary, or 10000000001, so the content stored in memory is

01000000000111100000000000000000000000000000000000000000000000000  ^          ^  exponent   mantissa 

but that doesn't really matter.

Your example also involves 450,

1.1100001000000000000000000000000000000000000000000000 x 2^00000001000 

and 60,

1.1110000000000000000000000000000000000000000000000000 x 2^00000000101 

You can play around with these values using this converter or any of many others on the internet.

When you compute the first expression, 450/(7*60), the processor first does the multiplication, obtaining 420, or

1.1010010000000000000000000000000000000000000000000000 x 2^00000001000 

Then it divides 450 by 420. This produces 15/14, which is

1.0001001001001001001001001001001001001001001001001001001001001001001001... 

in binary. Now, the Java language specification says that

Inexact results must be rounded to the representable value nearest to the infinitely precise result; if the two nearest representable values are equally near, the one with its least significant bit zero is chosen. This is the IEEE 754 standard's default rounding mode known as round to nearest.

and the nearest representable value to 15/14 in 64-bit IEEE 754 format is

1.0001001001001001001001001001001001001001001001001001 x 2^00000000000 

which is approximately 1.0714285714285714 in decimal. (More precisely, this is the least precise decimal value that uniquely specifies this particular binary representation.)

On the other hand, if you compute 450 / 7 first, the result is 64.2857142857..., or in binary,

1000000.01001001001001001001001001001001001001001001001001001001001001001... 

for which the nearest representable value is

1.0000000100100100100100100100100100100100100100100101 x 2^00000000110 

which is 64.28571428571429180465... Note the change in the last digit of the binary mantissa (compared to the exact value) due to roundoff error. Dividing this by 60 gives you

1.000100100100100100100100100100100100100100100100100110011001100110011... 

Look at the end: the pattern is different! It's 0011 that repeats, instead of 001 as in the other case. The closest representable value is

1.0001001001001001001001001001001001001001001001001010 x 2^00000000000 

which differs from the other order of operations in the last two bits: they're 10 instead of 01. The decimal equivalent is 1.0714285714285716.

The specific rounding that causes this difference should be clear if you look at the exact binary values:

1.0001001001001001001001001001001001001001001001001001001001001001001001... 1.0001001001001001001001001001001001001001001001001001100110011001100110...                                                      ^ last bit of mantissa 

It works out in this case that the former result, numerically 15/14, happens to be the most accurate representation of the exact value. This is an example of how leaving division until the end benefits you. But again, this rule only holds as long as the values you're working with don't use the full precision of the data type. Once you start working with inexact (rounded) values, you no longer protect yourself from further roundoff errors by doing the multiplications first.

like image 74
David Z Avatar answered Oct 24 '22 14:10

David Z


It has to do with how the double type is implemented and the fact that the floating-point types don't make the same precision guarantees as other simpler numerical types. Although the following answer is more specifically about sums, it also answers your question by explaining how there is no guarantee of infinite precision in floating-point mathematical operations: Why does changing the sum order returns a different result?. Essentially you should never attempt to determine the equality of floating-point values without specifying an acceptable margin of error. Google's Guava library includes DoubleMath.fuzzyEquals(double, double, double) to determine the equality of two double values within a certain precision. If you wish to read up on the specifics of floating-point equality this site is quite useful; the same site also explains floating-point rounding errors. In summation: the expected and actual values of your calculation differ because of the rounding differing between the calculations due to the order of operations.

like image 23
Emily Mabrey Avatar answered Oct 24 '22 14:10

Emily Mabrey