From my previous question "Is floating point precision mutable or invariant?" I received a response which said,
C provides DBL_DIG, DBL_DECIMAL_DIG, and their float and long double counterparts. DBL_DIG indicates the minimum relative decimal precision. DBL_DECIMAL_DIG can be thought of as the maximum relative decimal precision.
I looked these macros up. They are found in the header <cfloat>
. From the cplusplus reference page they list macros for float
, double
, and long double
.
Here are the macros for minimum precision values.
FLT_DIG 6 or greater
DBL_DIG 10 or greater
LDBL_DIG 10 or greater
If I took these macros at face value, I would assume that a float
has a minimum decimal precision of 6, while a double
and long double
have a minimum decimal precision of 10. However, being a big boy, I know that some things may be too good to be true.
Therefore, I would like to know. Do floats, doubles, and long doubles have guaranteed minimum decimal precision, and is this minimum decimal precision the values of the macros given above?
If not, why?
Note: Assume we are using programming language C++.
double has 2x more precision than float. float is a 32-bit IEEE 754 single precision Floating Point Number – 1 bit for the sign, 8 bits for the exponent, and 23* for the value. float has 7 decimal digits of precision.
A float has 23 bits of mantissa, and 2^23 is 8,388,608. 23 bits let you store all 6 digit numbers or lower, and most of the 7 digit numbers. This means that floating point numbers have between 6 and 7 digits of precision, regardless of exponent.
Solution. A variable of type float only has 7 digits of precision whereas a variable of type double has 15 digits of precision.
A float has 7 decimal digits of precision and occupies 32 bits . A double is a 64-bit IEEE 754 double-precision floating-point number. 1 bit for the sign, 11 bits for the exponent, and 52 bits for the value. A double has 15 decimal digits of precision and occupies a total of 64 bits .
If std::numeric_limits<
F>::is_iec559
is true, then the guarantees of the IEEE 754 standard apply to floating point type F.
Otherwise (and anyway), minimum permitted values of symbols such as DBL_DIG
are specified by the C standard, which, undisputably for the library, “is incorporated into [the C++] International Standard by reference”, as quoted from C++11 §17.5.1.5/1.
Edit: As noted by TC in a comment here,
” <climits> and <cfloat> are normatively incorporated by §18.3.3 [c.limits]; the minimum values are specified in turn in §5.2.4.2.2 of the C standard
Unfortunately for the formal view, first of all that quote from C++11 is from section 17.5 which is only informative, not normative. And secondly, the wording in the C standard that the values specified there are minimums, is also in a section (the C99 standard's Annex E) that's informative, not normative. So while it can be regarded as an in-practice guarantee, it's not a formal guarantee.
One strong indication that the in-practice minimum precision for float
is 6 decimal digits, that no implementation will give less:
output operations default to precision 6, and this is normative text.
Disclaimer: It may be that there is additional wording that provides guarantees that I didn't notice. Not very likely, but possible.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With