Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Layman's explanation for why JavaScript has weird floating math – IEEE 754 standard [duplicate]

I never understand exactly what's going on with JavaScript when I do mathematical operations on floating point numbers. I've been down-right fearful of using decimals, to the point where I just avoid them when at all possible. However, if I knew what was going on behind the scenes when it comes to the IEEE 754 standard, then I would be able to predict what would happen; with predictability, I'll be more confident and less fearful.

Could someone give me a simple explanation (as simple as explaining binary representations of integers as to how the IEEE 754 standard works and how it gives this side effect: 0.1 + 0.2 != 0.3?

Thanks so much! :)

like image 367
Sam Avatar asked Dec 03 '22 22:12

Sam


2 Answers

Decimal fractions like 0.1 can't be expressed cleanly in base 2

Let's say we want to express the decimal 0.1 in base-2. We know that it is equal to 1/10. The result of 1 divided by 10 in base-2 is 0.000110011001100... with a repeating sequence of decimals.

Thus while in decimal form it's actually really easy to cleanly represent a number like 0.1, in base-2 you cannot express a rational number based on 10ths exactly. You can only approximate it by using as many bits are you are able to store.

Let's say for simplification that we only have enough storage space to reproduce the first, say, 8 significant binary digits of that number. The digits stored would be 11001100 (along with an exponent of 11). This translates back to 0.000110011 in base-2 which in decimal is 0.099609375, not 0.1. This is the amount of error that would happen if you converted 0.1 to a theoretical floating point variable which stores base values in 8 bits (not including the sign bit).

How floating-point variables store values

The standard of IEEE 754 specifies a way of encoding a real number in binary, with a sign and a binary exponent. The exponent is applied in the binary domain, meaning that you don't shift the decimal point before converting to binary, you do it after.

There are different sizes of IEEE floating-point number, each one specifying how many of the binary digits are used for the base number and how many for an exponent.

When you see 0.1 + 0.2 != 0.3, it's because you are not actually performing the calculation on 0.1 or 0.2, but on approximations of these numbers in floating-point binary to a certain precision only. Upon converting the result back to decimal, the result won't be exactly 0.3, due to this error. In addition, the result won't even be equal to the binary approximation of 0.3, either. The actual amount of error will depend on the size of the floating point value, and thus how many bits of precision were used.

How rounding sometimes helps, but not in this case

In some cases, errors in calculation due to precision loss in the conversion to binary will be small enough to be rounded out of the value during the conversion back from binary again, and so you will never even notice any difference - it will look like it worked.

IEEE floating point has specific rules for how this rounding is to be done.

With 0.1 + 0.2 vs 0.3, however, the rounding does not cancel out the error. The result of adding the binary approximations of 0.1 and 0.2 will be different to the binary approximation of 0.3.

like image 135
thomasrutter Avatar answered Dec 05 '22 12:12

thomasrutter


It's the same reason that 1/3 + 1/3 + 1/3 != 1 if you naively convert 1/3 to 0.333 (or any finite number of 3's). 0.333 + 0.333 + 0.333 = 0.999, not 1.

In base 9 (for example), 1/3 can be represented exactly as 0.39, and 0.39 + 0.39 + 0.39 = 1.09. Some numbers which can be represented exactly in base 9 can't be exactly represented in base 10, and must necessarily be rounded to a number which can.

Similarly, some numbers can't be represented exactly in base 2 but can in base 10, such as 0.2.
0.210 is 0.0011001100110011...2
If this is rounded to 0.00112 then:
0.00112 + 0.00112 + 0.00112 + 0.00112 + 0.00112 = 0.11112, not 1.00002. (0.11112 is 15/16)

Since computers (at least the ones we use) do arithmetic in binary, this affects them.

Notice that the accuracy of the result increases as we use more digits. (0.3333333310 + 0.3333333310 + 0.3333333310 = 0.9999999910, which is closer to the correct answer than 0.99910)
For this reason, the error from rounding is usually very small. A double stores about 15 decimal digits, so the relative error is about 10-15 (more exactly, 2-52).

Because the error is small, it doesn't usually make a difference unless:

  • Your program requires very high accuracy, or
  • You display it with a large number of decimal places (you might see a number like 0.99999999999999995622), or
  • You compare two numbers for equality (using == or !=).

Comparing non-integer numbers for equality is definitely something you should avoid, but you can use them in calculations and other comparisons (< or >) without problems (again unless your program requires very high accuracy).

like image 29
user253751 Avatar answered Dec 05 '22 12:12

user253751