Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Do floats, doubles, and long doubles have a guaranteed minimum precision?

From my previous question "Is floating point precision mutable or invariant?" I received a response which said,

C provides DBL_DIG, DBL_DECIMAL_DIG, and their float and long double counterparts. DBL_DIG indicates the minimum relative decimal precision. DBL_DECIMAL_DIG can be thought of as the maximum relative decimal precision.

I looked these macros up. They are found in the header <cfloat>. From the cplusplus reference page they list macros for float, double, and long double.

Here are the macros for minimum precision values.

FLT_DIG 6 or greater

DBL_DIG 10 or greater

LDBL_DIG 10 or greater

If I took these macros at face value, I would assume that a float has a minimum decimal precision of 6, while a double and long double have a minimum decimal precision of 10. However, being a big boy, I know that some things may be too good to be true.

Therefore, I would like to know. Do floats, doubles, and long doubles have guaranteed minimum decimal precision, and is this minimum decimal precision the values of the macros given above?

If not, why?


Note: Assume we are using programming language C++.

like image 578
Wandering Fool Avatar asked Jun 02 '15 05:06

Wandering Fool


People also ask

Does double or float have more precision?

double has 2x more precision than float. float is a 32-bit IEEE 754 single precision Floating Point Number – 1 bit for the sign, 8 bits for the exponent, and 23* for the value. float has 7 decimal digits of precision.

Do floats have limited precision?

A float has 23 bits of mantissa, and 2^23 is 8,388,608. 23 bits let you store all 6 digit numbers or lower, and most of the 7 digit numbers. This means that floating point numbers have between 6 and 7 digits of precision, regardless of exponent.

What is the precision of float and double data type?

Solution. A variable of type float only has 7 digits of precision whereas a variable of type double has 15 digits of precision.

Is float double precision?

A float has 7 decimal digits of precision and occupies 32 bits . A double is a 64-bit IEEE 754 double-precision floating-point number. 1 bit for the sign, 11 bits for the exponent, and 52 bits for the value. A double has 15 decimal digits of precision and occupies a total of 64 bits .


1 Answers

If std::numeric_limits<F>::is_iec559 is true, then the guarantees of the IEEE 754 standard apply to floating point type F.

Otherwise (and anyway), minimum permitted values of symbols such as DBL_DIG are specified by the C standard, which, undisputably for the library, “is incorporated into [the C++] International Standard by reference”, as quoted from C++11 §17.5.1.5/1.

Edit: As noted by TC in a comment here,

<climits> and <cfloat> are normatively incorporated by §18.3.3 [c.limits]; the minimum values are specified in turn in §5.2.4.2.2 of the C standard

Unfortunately for the formal view, first of all that quote from C++11 is from section 17.5 which is only informative, not normative. And secondly, the wording in the C standard that the values specified there are minimums, is also in a section (the C99 standard's Annex E) that's informative, not normative. So while it can be regarded as an in-practice guarantee, it's not a formal guarantee.


One strong indication that the in-practice minimum precision for float is 6 decimal digits, that no implementation will give less:

output operations default to precision 6, and this is normative text.

Disclaimer: It may be that there is additional wording that provides guarantees that I didn't notice. Not very likely, but possible.

like image 67
Cheers and hth. - Alf Avatar answered Sep 30 '22 12:09

Cheers and hth. - Alf