First of all, this is not a floating point newbie question. I know results of floating point arithmetic (not to mention transcendental functions) usually cannot be represented exactly, and that most terminating decimals cannot be represented exactly as binary floating point numbers. That said, each possible floating point value corresponds exactly to a diadic rational (a rational number <code>p/q</code> where <code>q</code> is a power of 2), which in turn has an exact decimal representation. My question is: How do you find this exact decimal representation efficiently? <code>sprintf</code> and similar functions are usually only specified up to a number of significant digits to uniquely determine the original floating point value; they don't necessarily print the exact decimal representation. I know one algorithm I've used, but it's very slow, <code>O(e^2)</code> where <code>e</code> is the exponent. Here's an outline: <ol> <li>Convert the mantissa to a decimal integer. You can either do this by pulling apart the bits to read the mantissa directly, or you can write a messy floating point loop that first multiplies the value by a power of two to put it in the range 1<=x<10, then pulls off a digit at a time by casting to int, subtracting, and multiplying by 10.</li> <li>Apply the exponent by repeatedly multiplying or dividing by 2. This is an operation on the string of decimal digits you generated. Every ~3 multiplications will add an extra digit to the left. Every single dividion will add an extra digit to the right.</li> </ol> Is this really the best possible? I doubt it, but I'm not a floating-point expert and I can't find a way to do the base-10 computations on the floating point representation of the number without running into a possibility of inexact results (multiplying or dividing by anything but a power of 2 is a lossy operation on floating point numbers unless you know you have free bits to work with).

This question has a bureaucratic part and an algorithmic part. A floating point number is stored internally as (2e × m), where e is an exponent (itself in binary) and m is a mantissa. The bureaucratic part of the question is how to access this data, but R. seems more interested in the algorithmic part of the question, namely, converting (2e × m) to a fraction (a/b) in decimal form. The answer to the bureaucratic question in several languages is <code>frexp</code> (which is an interesting detail that I didn’t know before today). It is true that at first glance, it takes O(e2) work just to write 2e in decimal, and more time still for the mantissa. But, thanks to the magic of the Schönhage–Strassen fast multiplication algorithm, you can do it in Õ(e) time, where the tilde means “up to log factors”. If you view Schönhage–Strassen as magic, then it’s not that hard to think of what to do. If e is even, you can recursively compute 2e/2, and then square it using fast multiplication. On the other hand if e is odd, you can recursively compute 2e−1 and then double it. You have to be careful to check that there is a version of Schönhage–Strassen in base 10. Although it is not widely documented, it can be done in any base. Converting a very long mantissa from binary to base 10 is not exactly the same question, but it has a similar answer. You can divide the mantissa into two halves, m = a × 2k + b. Then recursively convert a and b to base 10, convert 2k to base 10, and do another fast multiplication to compute m in base 10. The abstract result behind all of this is that you can convert integers from one base to another in Õ(N) time. If the question is about standard 64-bit floating point numbers, then it’s too small for the fancy Schönhage–Strassen algorithm. In this range you can instead save work with various tricks. One approach is to store all 2048 values of 2e in a lookup table, and then work in the mantissa with asymmetric multiplication (in between long multiplication and short multiplication). Another trick is to work in base 10000 (or a higher power of 10, depending on architecture) instead of base 10. But, as R. points out in the comments, 128-bit floating point numbers already allow large enough exponents to call into question both lookup tables and standard long multiplication. As a practical matter, long multiplication is the fastest up to a handful of digits, then in a significant medium range one can use Karatsuba multiplication or Toom–Cook multiplication, and then after that a variation of Schönhage–Strassen is best not just in theory but also in practice. Actually, the big integer package GMP already has Õ(N)-time radix conversion, as well as good heuristics for which choice of multiplication algorithm. The only difference between their solution and mine is that instead of doing any big arithmetic in base 10, they compute large powers of 10 in base 2. In this solution, they also need fast division, but that can be obtained from fast multiplication in any of several ways.

I see you’ve accepted an answer already, but here are a couple of open source implementations of this conversion you might want to look at: <ol> <li> David Gay’s <code>dtoa()</code> function in <code>dtoa.c</code>: https://www.netlib.org/fp/dtoa.c. </li> <li> The function <code>___printf_fp()</code> in the <code>/stdio-common/printf_fp.c</code> file in Glibc (https://ftp.gnu.org/gnu/glibc/glibc-2.11.2.tar.gz, for example). </li> </ol> Both will print as many digits as you ask for in a <code>%f</code>-type <code>printf</code>, as I’ve written about at: <ul> <li> https://www.exploringbinary.com/print-precision-of-dyadic-fractions-varies-by-language/ and</li> <li> https://www.exploringbinary.com/print-precision-of-floating-point-integers-varies-too/.</li> </ul>

How do you print the EXACT value of a floating point number?

Q: How do I print a floating-point number?

You can do it like this: printf("%. 6f", myFloat); 6 represents the number of digits after the decimal separator.

Q: How do you find the value of floating points?

To do so, floating-point uses a biased exponent, which is the original exponent plus a constant bias. 32-bit floating-point uses a bias of 127. For example, for the exponent 7, the biased exponent is 7 + 127 = 134 = 100001102. For the exponent −4, the biased exponent is: −4 + 127 = 123 = 011110112.

Q: What is the value of floating-point?

A floating point number, is a positive or negative whole number with a decimal point. For example, 5.5, 0.25, and -103.342 are all floating point numbers, while 91, and 0 are not. Floating point numbers get their name from the way the decimal point can "float" to any position necessary.

Q: How do you get a floating-point number in python?

The float type in Python represents the floating point number. Float is used to represent real numbers and is written with a decimal point dividing the integer and fractional parts. For example, 97.98, 32.3+e18, -32.54e100 all are floating point numbers.

Tags:

c

algorithm

math

floating-point

First of all, this is not a floating point newbie question. I know results of floating point arithmetic (not to mention transcendental functions) usually cannot be represented exactly, and that most terminating decimals cannot be represented exactly as binary floating point numbers.

That said, each possible floating point value corresponds exactly to a diadic rational (a rational number p/q where q is a power of 2), which in turn has an exact decimal representation.

My question is: How do you find this exact decimal representation efficiently? sprintf and similar functions are usually only specified up to a number of significant digits to uniquely determine the original floating point value; they don't necessarily print the exact decimal representation. I know one algorithm I've used, but it's very slow, O(e^2) where e is the exponent. Here's an outline:

Convert the mantissa to a decimal integer. You can either do this by pulling apart the bits to read the mantissa directly, or you can write a messy floating point loop that first multiplies the value by a power of two to put it in the range 1<=x<10, then pulls off a digit at a time by casting to int, subtracting, and multiplying by 10.
Apply the exponent by repeatedly multiplying or dividing by 2. This is an operation on the string of decimal digits you generated. Every ~3 multiplications will add an extra digit to the left. Every single dividion will add an extra digit to the right.

Is this really the best possible? I doubt it, but I'm not a floating-point expert and I can't find a way to do the base-10 computations on the floating point representation of the number without running into a possibility of inexact results (multiplying or dividing by anything but a power of 2 is a lossy operation on floating point numbers unless you know you have free bits to work with).

550

asked Jul 09 '10 17:07

R.. GitHub STOP HELPING ICE

2 Answers

This question has a bureaucratic part and an algorithmic part. A floating point number is stored internally as (2^e × m), where e is an exponent (itself in binary) and m is a mantissa. The bureaucratic part of the question is how to access this data, but R. seems more interested in the algorithmic part of the question, namely, converting (2^e × m) to a fraction (a/b) in decimal form. The answer to the bureaucratic question in several languages is frexp (which is an interesting detail that I didn’t know before today).

It is true that at first glance, it takes O(e²) work just to write 2^e in decimal, and more time still for the mantissa. But, thanks to the magic of the Schönhage–Strassen fast multiplication algorithm, you can do it in Õ(e) time, where the tilde means “up to log factors”. If you view Schönhage–Strassen as magic, then it’s not that hard to think of what to do. If e is even, you can recursively compute 2^e/2, and then square it using fast multiplication. On the other hand if e is odd, you can recursively compute 2^e−1 and then double it. You have to be careful to check that there is a version of Schönhage–Strassen in base 10. Although it is not widely documented, it can be done in any base.

Converting a very long mantissa from binary to base 10 is not exactly the same question, but it has a similar answer. You can divide the mantissa into two halves, m = a × 2^k + b. Then recursively convert a and b to base 10, convert 2^k to base 10, and do another fast multiplication to compute m in base 10.

The abstract result behind all of this is that you can convert integers from one base to another in Õ(N) time.

If the question is about standard 64-bit floating point numbers, then it’s too small for the fancy Schönhage–Strassen algorithm. In this range you can instead save work with various tricks. One approach is to store all 2048 values of 2^e in a lookup table, and then work in the mantissa with asymmetric multiplication (in between long multiplication and short multiplication). Another trick is to work in base 10000 (or a higher power of 10, depending on architecture) instead of base 10. But, as R. points out in the comments, 128-bit floating point numbers already allow large enough exponents to call into question both lookup tables and standard long multiplication. As a practical matter, long multiplication is the fastest up to a handful of digits, then in a significant medium range one can use Karatsuba multiplication or Toom–Cook multiplication, and then after that a variation of Schönhage–Strassen is best not just in theory but also in practice.

Actually, the big integer package GMP already has Õ(N)-time radix conversion, as well as good heuristics for which choice of multiplication algorithm. The only difference between their solution and mine is that instead of doing any big arithmetic in base 10, they compute large powers of 10 in base 2. In this solution, they also need fast division, but that can be obtained from fast multiplication in any of several ways.

200

answered Oct 09 '22 20:10

Greg Kuperberg

I see you’ve accepted an answer already, but here are a couple of open source implementations of this conversion you might want to look at:

David Gay’s dtoa() function in dtoa.c: https://www.netlib.org/fp/dtoa.c.
The function ___printf_fp() in the /stdio-common/printf_fp.c file in Glibc (https://ftp.gnu.org/gnu/glibc/glibc-2.11.2.tar.gz, for example).

Both will print as many digits as you ask for in a %f-type printf, as I’ve written about at:

https://www.exploringbinary.com/print-precision-of-dyadic-fractions-varies-by-language/ and
https://www.exploringbinary.com/print-precision-of-floating-point-integers-varies-too/.

answered Oct 09 '22 19:10

Rick Regan

Related questions
                            
                                Using C/Pthreads: do shared variables need to be volatile?
                            
                                What easy zlib tutorials are there? [closed]
                            
                                Is a float guaranteed to be preserved when transported through a double in C/C++?
                            
                                Is it possible to configure CLion to compile source files in a project independently?
                            
                                How to use realloc in a function in C
                            
                                When to use C float comparison functions?
                            
                                Expression must be a modifiable L-value
                            
                                How can I wait for any/all pthreads to complete?
                            
                                Is there a way to not wait for a system() command to finish? (in c) [duplicate]
                            
                                How to write self modifying code in C?
                            
                                How to compile C programming in Windows 7? [closed]
                            
                                Type-safety in C
                            
                                Creating static Mac OS X C build
                            
                                Accessing Keys from Linux Input Device
                            
                                Does C make a difference between compiling and executing a program?
                            
                                What are "cerr" and "stderr"?
                            
                                How to find my current compiler's standard, like if it is C90, etc
                            
                                string format for intptr_t and uintptr_t
                            
                                How to find the socket connection state in C?
                            
                                "Multiple definition", "first defined here" errors

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With