How do functions such as printf
extract digits from a floating point number? I understand how this could be done in principle. Given a number x
, of which you want the first n
digits, scale x
by a power of 10 so that x
is between pow(10, n)
and pow(10, n-1)
. Then convert x
into an integer, and take the digits of the integer.
I tried this, and it worked. Sort of. My answer was identical to the answer given by printf
for the first 16 decimal digits, but tended to differ on the ones after that. How does printf
do it?
You can do it like this: printf("%. 6f", myFloat); 6 represents the number of digits after the decimal separator.
We can print the double value using both %f and %lf format specifier because printf treats both float and double are same.
To read a floating-point number (fractional number, non-integer) from the console use the following command: var num = double. Parse(Console.
Real numbers are represented in C by the floating point types float, double, and long double. Just as the integer types can't represent all integers because they fit in a bounded number of bytes, so also the floating-point types can't represent all real numbers.
The classic implementation is David Gay's dtoa
. The exact details are somewhat arcane (see Why does "dtoa.c" contain so much code?), but in general it works by doing the base conversion using more precision beyond what you can get from a 32-bit, 64-bit, or even 80-bit floating point number. To do this, it uses so-called "bigints" or arbitrary-precision numbers, which can hold as many digits as you can fit in memory. Gay's code has been copied, with modifications, into countless other libraries including common implementations for the C standard library (so it might power your printf
), Java, Python, PHP, JavaScript, etc.
(As a side note... not all of these copies of Gay's dtoa code were kept up to date, so because PHP used an old version of strtod it hung when parsing 2.2250738585072011e-308.)
In general, if you do things the "obvious" and simple way like multiplying by a power of 10 and then converting the integer, you will lose a small amount of precision and some of the results will be inaccurate... but maybe you will get the first 14 or 15 digits correct. Gay's implementation of dtoa() claims to get all the digits correct... but as a result, the code is quite difficult to follow. Skip to the bottom to see strtod itself, you can see that it starts with a "fast path" which just uses ordinary floating-point arithmetic, but then it detects if that result is incorrect and uses a more reliable algorithm using bigints which works in all cases (but is slower).
The implementation has the following citation, which you may find interesting:
* Inspired by "How to Print Floating-Point Numbers Accurately" by * Guy L. Steele, Jr. and Jon L. White [Proc. ACM SIGPLAN '90, pp. 112-126].
The algorithm works by calculating a range of decimal numbers which produce the given binary number, and by using more digits, the range gets smaller and smaller until you either have an exact result or you can correctly round to the requested number of digits.
In particular, from sec 2.2 Algorithm,
The algorithm uses exact rational arithmetic to perform its computations so that there is no loss of accuracy. In order to generate digits, the algorithm scales the number so that it is of the form 0.d1d2..., where d1, d2, ..., are base-B digits. The first digit is computed by multiplying the scaled number by the output base, B, and taking the integer part. The remainder is used to compute the rest of the digits using the same approach.
The algorithm can then continue until it has the exact result (which is always possible, since floating-point numbers are base 2, and 2 is a factor of 10) or until it has as many digits as requested. The paper goes on to prove the algorithm's correctness.
Also note that not all implementations of printf
are based on Gay's dtoa, this is just a particularly common implementation that's been copied a lot.
There are various ways to convert floating-point numbers to decimal numerals without error (either exactly or with rounding to a desired precision).
One method is to use arithmetic as taught in elementary school. C provides functions to work with floating-point numbers, such as frexp
, which separates the fraction (also called the significand, often mistakenly called a mantissa) and the exponent. Given a floating-point number, you could create a large array to store decimal digits in and then compute the digits. Each bit in the fraction part of a floating-point number represents some power of two, as determined by the exponent in the floating-point number. So you can simply put a “1” in an array of digits and then use elementary school arithmetic to multiply or divide it the required number of times. You can do that for each bit and then add all the results, and the sum is the decimal numeral that equals the floating-point number.
Commercial printf
implementations will use more sophisticated algorithms. Discussing them is beyond the scope of a Stack Overflow question-and-answer. The seminal paper on this is Correctly Rounded Binary-Decimal and Decimal-Binary Conversions by David M. Gay. (A copy appears to be available here, but that seems to be hosted by a third party; I am not sure how official or durable it is. A web search may turn up other sources.) A more recent paper with an algorithm for converting a binary floating-point number to decimal with the shortest number of digits needed to uniquely distinguish the value is Printing Floating-Point Numbers: An Always Correct Method by Marc Andrysco, Ranjit Jhala, and Sorin Lerner.
One key to how it is done is that printf
will not just use the floating-point format and its operations to do the work. It will use some form of extended-precision arithmetic, either by working with parts of the floating-point number in an integer format with more bits, by separating the floating-point number into pieces and using multiple floating-point numbers to work with it, or by using a floating-point format with more precision.
Note that the first step in your question, multiple x by a power of ten, already has two rounding errors. First, not all powers of ten are exactly representable in binary floating-point, so just producing such a power of ten necessarily has some representation error. Then, multiplying x
by another number often produces a mathematical result that is not exactly representable, so it must be rounded to the floating-point format.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With