Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Precision loss from float to double, and from double to float?

float fv = orginal_value;  // original_value may be any float value
...
double dv = (double)fv;
...
fv = (float)dv;

SHOULD fv be equal to original_value exactly? Any precision may be lost?

like image 545
ravin.wang Avatar asked Apr 25 '16 12:04

ravin.wang


People also ask

Which has longer precision double or float?

double has 2x more precision than float. float is a 32-bit IEEE 754 single precision Floating Point Number – 1 bit for the sign, 8 bits for the exponent, and 23* for the value. float has 7 decimal digits of precision.

Is float less precise than double?

Float and doubleDouble is more precise than float and can store 64 bits, double of the number of bits float can store.

How do you calculate a double precision floating point?

Short answer: the max value for a double-precision value (assuming IEEE 754 floating-point) is exactly 2^1024 * (1 - 2^-53). For a single-precision value it's 2^128 * (1 - 2^-24).

Why do floating-point numbers lose precision?

Floating-point numbers suffer from a loss of precision when represented with a fixed number of bits (e.g., 32-bit or 64-bit). This is because there is an infinite amount of real numbers, even within a small range like 0.0 to 0.1.


1 Answers

SHOULD fv be equal to original_value exactly? Any precision may be lost?

Yes, if the value of dv did not change in between.

From section Conversion 6.3.1.5 Real Floating types in C99 specs:

  1. When a float is promoted to double or long double, or a double is promoted to long double, its value is unchanged.
  2. When a double is demoted to float, a long double is demoted to double or float, or a value being represented in greater precision and range than required by its semantic type (see 6.3.1.8) is explicitly converted to its semantic type, if the value being converted can be represented exactly in the new type, it is unchanged. If the value being converted is in the range of values that can be represented but cannot be represented exactly, the result is either the nearest higher or nearest lower representable value, chosen in an implementation-defined manner. If the value being converted is outside the range of values that can be represented, the behavior is undefined

For C++, from section 4.6 aka conv.fpprom (draft used: n337 and I believe similar lines are available in final specs)

A prvalue of type float can be converted to a prvalue of type double. The value is unchanged. This conversion is called floating point promotion.

And section 4.8 aka conv.double

A prvalue of floating point type can be converted to a prvalue of another floating point type. If the source value can be exactly represented in the destination type, the result of the conversion is that exact representation. If the source value is between two adjacent destination values, the result of the conversion is an implementation-defined choice of either of those values. Otherwise, the behavior is undefined. The conversions allowed as floating point promotions are excluded from the set of floating point conversions

So the values should be equal exactly.

like image 161
Mohit Jain Avatar answered Sep 18 '22 14:09

Mohit Jain