How do functions such as <code>printf</code> extract digits from a floating point number? I understand how this could be done in principle. Given a number <code>x</code>, of which you want the first <code>n</code> digits, scale <code>x</code> by a power of 10 so that <code>x</code> is between <code>pow(10, n)</code> and <code>pow(10, n-1)</code>. Then convert <code>x</code> into an integer, and take the digits of the integer. I tried this, and it worked. Sort of. My answer was identical to the answer given by <code>printf</code> for the first 16 decimal digits, but tended to differ on the ones after that. How does <code>printf</code> do it?

The classic implementation is David Gay's <code>dtoa</code>. The exact details are somewhat arcane (see Why does "dtoa.c" contain so much code?), but in general it works by doing the base conversion using more precision beyond what you can get from a 32-bit, 64-bit, or even 80-bit floating point number. To do this, it uses so-called "bigints" or arbitrary-precision numbers, which can hold as many digits as you can fit in memory. Gay's code has been copied, with modifications, into countless other libraries including common implementations for the C standard library (so it might power your <code>printf</code>), Java, Python, PHP, JavaScript, etc. (As a side note... not all of these copies of Gay's dtoa code were kept up to date, so because PHP used an old version of strtod it hung when parsing 2.2250738585072011e-308.) In general, if you do things the "obvious" and simple way like multiplying by a power of 10 and then converting the integer, you will lose a small amount of precision and some of the results will be inaccurate... but maybe you will get the first 14 or 15 digits correct. Gay's implementation of dtoa() claims to get all the digits correct... but as a result, the code is quite difficult to follow. Skip to the bottom to see strtod itself, you can see that it starts with a "fast path" which just uses ordinary floating-point arithmetic, but then it detects if that result is incorrect and uses a more reliable algorithm using bigints which works in all cases (but is slower). The implementation has the following citation, which you may find interesting: <pre class="prettyprint"> * Inspired by "How to Print Floating-Point Numbers Accurately" by * Guy L. Steele, Jr. and Jon L. White [Proc. ACM SIGPLAN '90, pp. 112-126]. </pre> The algorithm works by calculating a range of decimal numbers which produce the given binary number, and by using more digits, the range gets smaller and smaller until you either have an exact result or you can correctly round to the requested number of digits. In particular, from sec 2.2 Algorithm, <blockquote> The algorithm uses exact rational arithmetic to perform its computations so that there is no loss of accuracy. In order to generate digits, the algorithm scales the number so that it is of the form 0.d1d2..., where d1, d2, ..., are base-B digits. The first digit is computed by multiplying the scaled number by the output base, B, and taking the integer part. The remainder is used to compute the rest of the digits using the same approach. </blockquote> The algorithm can then continue until it has the exact result (which is always possible, since floating-point numbers are base 2, and 2 is a factor of 10) or until it has as many digits as requested. The paper goes on to prove the algorithm's correctness. <hr> Also note that not all implementations of <code>printf</code> are based on Gay's dtoa, this is just a particularly common implementation that's been copied a lot.

How does printf extract digits from a floating point number?

Tags:

c++

c

floating-point

printf

How do functions such as printf extract digits from a floating point number? I understand how this could be done in principle. Given a number x, of which you want the first n digits, scale x by a power of 10 so that x is between pow(10, n) and pow(10, n-1). Then convert x into an integer, and take the digits of the integer.

I tried this, and it worked. Sort of. My answer was identical to the answer given by printf for the first 16 decimal digits, but tended to differ on the ones after that. How does printf do it?

678

asked Jun 26 '18 22:06

Alecto Irene Perez

2 Answers

The classic implementation is David Gay's dtoa. The exact details are somewhat arcane (see Why does "dtoa.c" contain so much code?), but in general it works by doing the base conversion using more precision beyond what you can get from a 32-bit, 64-bit, or even 80-bit floating point number. To do this, it uses so-called "bigints" or arbitrary-precision numbers, which can hold as many digits as you can fit in memory. Gay's code has been copied, with modifications, into countless other libraries including common implementations for the C standard library (so it might power your printf), Java, Python, PHP, JavaScript, etc.

(As a side note... not all of these copies of Gay's dtoa code were kept up to date, so because PHP used an old version of strtod it hung when parsing 2.2250738585072011e-308.)

In general, if you do things the "obvious" and simple way like multiplying by a power of 10 and then converting the integer, you will lose a small amount of precision and some of the results will be inaccurate... but maybe you will get the first 14 or 15 digits correct. Gay's implementation of dtoa() claims to get all the digits correct... but as a result, the code is quite difficult to follow. Skip to the bottom to see strtod itself, you can see that it starts with a "fast path" which just uses ordinary floating-point arithmetic, but then it detects if that result is incorrect and uses a more reliable algorithm using bigints which works in all cases (but is slower).

The implementation has the following citation, which you may find interesting:

 * Inspired by "How to Print Floating-Point Numbers Accurately" by
 * Guy L. Steele, Jr. and Jon L. White [Proc. ACM SIGPLAN '90, pp. 112-126].

The algorithm works by calculating a range of decimal numbers which produce the given binary number, and by using more digits, the range gets smaller and smaller until you either have an exact result or you can correctly round to the requested number of digits.

In particular, from sec 2.2 Algorithm,

The algorithm uses exact rational arithmetic to perform its computations so that there is no loss of accuracy. In order to generate digits, the algorithm scales the number so that it is of the form 0.d₁d₂..., where d₁, d₂, ..., are base-B digits. The first digit is computed by multiplying the scaled number by the output base, B, and taking the integer part. The remainder is used to compute the rest of the digits using the same approach.

The algorithm can then continue until it has the exact result (which is always possible, since floating-point numbers are base 2, and 2 is a factor of 10) or until it has as many digits as requested. The paper goes on to prove the algorithm's correctness.

Also note that not all implementations of printf are based on Gay's dtoa, this is just a particularly common implementation that's been copied a lot.

102

answered Nov 10 '22 18:11

Dietrich Epp

There are various ways to convert floating-point numbers to decimal numerals without error (either exactly or with rounding to a desired precision).

One method is to use arithmetic as taught in elementary school. C provides functions to work with floating-point numbers, such as frexp, which separates the fraction (also called the significand, often mistakenly called a mantissa) and the exponent. Given a floating-point number, you could create a large array to store decimal digits in and then compute the digits. Each bit in the fraction part of a floating-point number represents some power of two, as determined by the exponent in the floating-point number. So you can simply put a “1” in an array of digits and then use elementary school arithmetic to multiply or divide it the required number of times. You can do that for each bit and then add all the results, and the sum is the decimal numeral that equals the floating-point number.

Commercial printf implementations will use more sophisticated algorithms. Discussing them is beyond the scope of a Stack Overflow question-and-answer. The seminal paper on this is Correctly Rounded Binary-Decimal and Decimal-Binary Conversions by David M. Gay. (A copy appears to be available here, but that seems to be hosted by a third party; I am not sure how official or durable it is. A web search may turn up other sources.) A more recent paper with an algorithm for converting a binary floating-point number to decimal with the shortest number of digits needed to uniquely distinguish the value is Printing Floating-Point Numbers: An Always Correct Method by Marc Andrysco, Ranjit Jhala, and Sorin Lerner.

One key to how it is done is that printf will not just use the floating-point format and its operations to do the work. It will use some form of extended-precision arithmetic, either by working with parts of the floating-point number in an integer format with more bits, by separating the floating-point number into pieces and using multiple floating-point numbers to work with it, or by using a floating-point format with more precision.

Note that the first step in your question, multiple x by a power of ten, already has two rounding errors. First, not all powers of ten are exactly representable in binary floating-point, so just producing such a power of ten necessarily has some representation error. Then, multiplying x by another number often produces a mathematical result that is not exactly representable, so it must be rounded to the floating-point format.

answered Nov 10 '22 19:11

Eric Postpischil

Related questions
                            
                                Typo: bool to int conversion induced error in std::ifstream under Linux
                            
                                How to put header file to .tab.h in Bison?
                            
                                Why are "double braces" needed in declaration of multi-dimensional array using stacked std::array?
                            
                                Is there a better way to return the same string literal from both a static and non-static function?
                            
                                Link between lambda calculus and lambda expressions in C++
                            
                                std::atomic in a union with another character
                            
                                Invalid conversion from 'const char**' to 'char* const*'
                            
                                qt5 undefined reference to 'QApplication::QApplication(int&, char**, int)'
                            
                                Why does CDC::SelectObject(CFont*) accept a CFont object vs. a pointer?
                            
                                Are iterators from a concurrent hash map safe?
                            
                                Have I misunderstood the scope of this default argument shared_ptr?
                            
                                C++ wrapper around any collection type using templates
                            
                                How can I get a future from boost::asio::post?
                            
                                Is it possible to access a member inside the parent class body with CRTP? [duplicate]
                            
                                Why unique_ptr with custom deleter won't work for nullptr, while shared_ptr does?
                            
                                Is initializing a atomic pointer atomic? What happens if initialization or memory allocation throws?
                            
                                Why does Google Test/Mock show leaked mock object error by std::unique_ptr?
                            
                                fwrite 4 char array, would write 7 instead of 4
                            
                                How to calibrate camera focal length, translation and rotation given four points?
                            
                                std::function has performances issues, how to avoid it?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With