Suppose I have a floating-point value of type float
or double
(i.e. 32 or 64 bits on typical machines). I want to print this value as text (e.g. to the standard output stream), and then later, in some other process, scan it back in - with fscanf()
if I'm using C, or perhaps with istream::operator>>()
if I'm using C++. But - I need the scanned float to end up being exactly, identical to the original value (up to equivalent representations of the same value). Also, the printed value should be easily readable - to a human - as floating-point, i.e. I don't want to print 0x42355316 and reinterpret that as a 32-bit float.
How should I do this? I'm assuming the standard library of (C and C++) won't be sufficient, but perhaps I'm wrong. I suppose that a sufficient number of decimal digits might be able to guarantee an error that's underneath the precision threshold - but that's not the same as guaranteeing the rounding/truncation will happen just the way I want it.
Notes:
You can do it like this: printf("%. 6f", myFloat); 6 represents the number of digits after the decimal separator.
2223004 in float or double , divide 0 by 4, add 0, divide by 4, add 3, divide by 4, add 2, divide by 4, add 2, divide by 4, add 2, and divide by 4. The result will be exact, and you can then do arithmetic natively in float or double .
Use "%. By using this format specifier we can print specific number of digits after the decimal, here "n" is the number of digits after decimal point.
The data type float has 24 bits of precision. This is equivalent to only about 7 decimal places. (The rest of the 32 bits are used for the sign and size of the number.) The number of places of precision for float is the same no matter what the size of the number.
First, you should use the %a
format with fprintf
and fscanf
. This is what it was designed for, and the C standard requires it to work (reproduce the original number) if the implementation uses binary floating-point.
Failing that, you should print a float
with at least FLT_DECIMAL_DIG
significant digits and a double
with at least DBL_DECIMAL_DIG
significant digits. Those constants are defined in <float.h>
and are defined:
… number of decimal digits, n, such that any floating-point number with p radix b digits can be rounded to a floating-point number with n decimal digits and back again without change to the value,… [b is the base used for the floating-point format, defined in
FLT_RADIX
, and p is the number of base-b digits in the format.]
For example:
printf("%.*g\n", FLT_DECIMAL_DIG, 1.f/3);
or:
#define QuoteHelper(x) #x
#define Quote(x) QuoteHelper(x)
…
printf("%." Quote(FLT_DECIMAL_DIG) "g\n", 1.f/3);
In C++, these constants are defined in <limits>
as std::numeric_limits<Type>::max_digits10
, where Type
is float
or double
or another floating-point type.
Note that the C standard only recommends that such a round-trip through a decimal numeral work; it does not require it. For example, C 2018 5.2.4.2.2 15 says, under the heading “Recommended practice”:
Conversion from (at least)
double
to decimal withDECIMAL_DIG
digits and back should be the identity function. [DECIMAL_DIG
is the equivalent ofFLT_DECIMAL_DIG
orDBL_DECIMAL_DIG
for the widest floating-point format supported in the implementation.]
In contrast, if you use %a
, and FLT_RADIX
is a power of two (meaning the implementation uses a floating-point base that is two, 16, or another power of two), then C standard requires that the result of scanning the numeral produced with %a
equals the original number.
I need the scanned float to end up being exactly, identical to the original value.
As already pointed out in the other answers, that can be achieved with the %a
format specifier.
Also, the printed value should be easily readable - to a human - as floating-point, i.e. I don't want to print 0x42355316 and reinterpret that as a 32-bit float.
That's more tricky and subjective. The first part of the string that %a
produces is in fact a fraction composed by hexadecimal digits, so that an output like 0x1.4p+3
may take some time to be parsed as 10
by a human reader.
An option could be to print all the decimal digits needed to represent the floating-point value, but there may be a lot of them. Consider, for example the value 0.1, its closest representation as a 64-bit float may be
0x1.999999999999ap-4 == 0.1000000000000000055511151231257827021181583404541015625
While printf("%.*lf\n", DBL_DECIMAL_DIG, 01);
(see e.g. Eric's answer) would print
0.10000000000000001 // If DBL_DECIMAL_DIG == 17
My proposal is somewhere in the middle. Similarly to what %a
does, we can exactly represent any floating-point value with radix 2 as a fraction multiplied by 2 raised to some integer power. We can transform that fraction into a whole number (increasing the exponent accordingly) and print it as a decimal value.
0x1.999999999999ap-4 --> 1.999999999999a16 * 2-4 --> 1999999999999a16 * 2-56 --> 720575940379279410 * 2-56
That whole number has a limited number of digits (it's < 253), but the result it's still an exact representation of the original double
value.
The following snippet is a proof of concept, without any check for corner cases. The format specifier %a
separates the mantissa and the exponent with a p
character (as in "... multiplied by two raised to the Power of..."), I'll use a q
instead, for no particular reason other than using a different symbol.
The value of the mantissa will also be reduced (and the exponent raised accordingly), removing all the trailing zero-bits. The idea beeing that 5q+1
(parsed as 510 * 21) should be more "easily" identified as 10
, rather than 2814749767106560q-48
.
#include <math.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void to_my_format(double x, char *str)
{
int exponent;
double mantissa = frexp(x, &exponent);
long long m = 0;
if ( mantissa ) {
exponent -= 52;
m = (long long)scalbn(mantissa, 52);
// A reduced mantissa should be more readable
while (m && m % 2 == 0) {
++exponent;
m /= 2;
}
}
sprintf(str, "%lldq%+d", m, exponent);
// ^
// Here 'q' is used to separate the mantissa from the exponent
}
double from_my_format(char const *str)
{
char *end;
long long mantissa = strtoll(str, &end, 10);
long exponent = strtol(str + (end - str + 1), &end, 10);
return scalbn(mantissa, exponent);
}
int main(void)
{
double tests[] = { 1, 0.5, 2, 10, -256, acos(-1), 1000000, 0.1, 0.125 };
size_t n = (sizeof tests) / (sizeof *tests);
char num[32];
for ( size_t i = 0; i < n; ++i ) {
to_my_format(tests[i], num);
double x = from_my_format(num);
printf("%22s%22a ", num, tests[i]);
if ( tests[i] != x )
printf(" *** %22a *** Round-trip failed\n", x);
else
printf("%58.55g\n", x);
}
return 0;
}
Testable here.
Generally, the improvement in readability is admitedly little to none, surely a matter of opinion.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With