Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Which values cannot be represented correctly by a double

The Double data type cannot correctly represent some base 10 values. This is because of how floating point numbers represent real numbers. What this means is that when representing monetary values, one should use the decimal value type to prevent errors. (feel free to correct errors in this preamble)

What I want to know is what are the values which present such a problem under the Double data-type under a 64 bit architecture in the standard .Net framework (C# if that makes a difference) ?

I expect the answer the be a formula or rule to find such values but I would also like some example values.

like image 466
Gilles Avatar asked Aug 28 '12 18:08

Gilles


People also ask

What can a double represent?

A double is a floating-point data type that uses sign, exponent and mantissa fractional parts to represent a value with up to 16 decimal points.

Can double represent decimals?

No.Double uses 64 bits to represent data. Decimal uses 128 bits to represent data.

Can double represent all longs?

Short answer is "no" - the range of values an int can represent and that a double can represent are implementation defined - but a double certainly cannot support every integral value in the range it can represent.

Can double represent all floats?

Yes, a double can represent all values a float can. Here is why: Both numbers are represented as the sign, the exponent and the mantissa. The difference between float and double is, that there is more space for the exponent and the mantissa.


2 Answers

Any number which cannot be written as the sum of positive and negative powers of 2 cannot be exactly represented as a binary floating-point number.

The common IEEE formats for 32- and 64-bit representations of floating-point numbers impose further constraints; they limit the number of binary digits in both the significand and the exponent. So there are maximum and minimum representable numbers (approximately +/- 10^308 (base-10) if memory serves) and limits to the precision of a number that can be represented. This limit on the precision means that, for 64-bit numbers, the difference between the exponent of the largest power of 2 and the smallest power in a number is limited to 52, so if your number includes a term in 2^52 it can't also include a term in 2^-1.

Simple examples of numbers which cannot be exactly represented in binary floating-point numbers include 1/3, 2/3, 1/5.

Since the set of floating-point numbers (in any representation) is finite, and the set of real numbers is infinite, one algorithm to find a real number which is not exactly representable as a floating-point number is to select a real number at random. The probability that the real number is exactly representable as a floating-point number is 0.

like image 111
High Performance Mark Avatar answered Sep 25 '22 12:09

High Performance Mark


You generally need to be prepared for the possibility that any value you store in a double has some small amount of error. Unless you're storing a constant value, chances are it could be something with at least some error. If it's imperative that there never be any error, and the values aren't constant, you probably shouldn't be using a floating point type.

What you probably should be asking in many cases is, "How do I deal with the minor floating point errors?" You'll want to know what types of operations can result in a lot of error, and what types don't. You'll want to ensure that comparing two values for "equality" actually just ensures they are "close enough" rather than exactly equal, etc.

like image 45
Servy Avatar answered Sep 22 '22 12:09

Servy