float fv = orginal_value; // original_value may be any float value
...
double dv = (double)fv;
...
fv = (float)dv;
SHOULD fv be equal to original_value exactly? Any precision may be lost?
double has 2x more precision than float. float is a 32-bit IEEE 754 single precision Floating Point Number – 1 bit for the sign, 8 bits for the exponent, and 23* for the value. float has 7 decimal digits of precision.
Float and doubleDouble is more precise than float and can store 64 bits, double of the number of bits float can store.
Short answer: the max value for a double-precision value (assuming IEEE 754 floating-point) is exactly 2^1024 * (1 - 2^-53). For a single-precision value it's 2^128 * (1 - 2^-24).
Floating-point numbers suffer from a loss of precision when represented with a fixed number of bits (e.g., 32-bit or 64-bit). This is because there is an infinite amount of real numbers, even within a small range like 0.0 to 0.1.
SHOULD fv be equal to original_value exactly? Any precision may be lost?
Yes, if the value of dv
did not change in between.
From section Conversion 6.3.1.5 Real Floating types in C99 specs:
- When a float is promoted to double or long double, or a double is promoted to long double, its value is unchanged.
- When a double is demoted to float, a long double is demoted to double or float, or a value being represented in greater precision and range than required by its semantic type (see 6.3.1.8) is explicitly converted to its semantic type, if the value being converted can be represented exactly in the new type, it is unchanged. If the value being converted is in the range of values that can be represented but cannot be represented exactly, the result is either the nearest higher or nearest lower representable value, chosen in an implementation-defined manner. If the value being converted is outside the range of values that can be represented, the behavior is undefined
For C++, from section 4.6 aka conv.fpprom (draft used: n337 and I believe similar lines are available in final specs)
A prvalue of type float can be converted to a prvalue of type double. The value is unchanged. This conversion is called floating point promotion.
And section 4.8 aka conv.double
A prvalue of floating point type can be converted to a prvalue of another floating point type. If the source value can be exactly represented in the destination type, the result of the conversion is that exact representation. If the source value is between two adjacent destination values, the result of the conversion is an implementation-defined choice of either of those values. Otherwise, the behavior is undefined. The conversions allowed as floating point promotions are excluded from the set of floating point conversions
So the values should be equal exactly.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With