Suppose I have a floating-point value of type <code>float</code> or <code>double</code> (i.e. 32 or 64 bits on typical machines). I want to print this value as text (e.g. to the standard output stream), and then later, in some other process, scan it back in - with <code>fscanf()</code> if I'm using C, or perhaps with <code>istream::operator>>()</code> if I'm using C++. But - I need the scanned float to end up being exactly, identical to the original value (up to equivalent representations of the same value). Also, the printed value should be easily readable - to a human - as floating-point, i.e. I don't want to print 0x42355316 and reinterpret that as a 32-bit float. How should I do this? I'm assuming the standard library of (C and C++) won't be sufficient, but perhaps I'm wrong. I suppose that a sufficient number of decimal digits might be able to guarantee an error that's underneath the precision threshold - but that's not the same as guaranteeing the rounding/truncation will happen just the way I want it. Notes: <ul> <li>The scanning does not having to be perfectly accurate w.r.t. the value it scans, only the original value.</li> <li>If it makes it easier, you may assume the value is a number and is not infinity.</li> <li>denormal support is desired but not required; still if we get a denormal, failure should be conspicuous.</li> </ul>

<blockquote> I need the scanned float to end up being exactly, identical to the original value. </blockquote> As already pointed out in the other answers, that can be achieved with the <code>%a</code> format specifier. <blockquote> Also, the printed value should be easily readable - to a human - as floating-point, i.e. I don't want to print 0x42355316 and reinterpret that as a 32-bit float. </blockquote> That's more tricky and subjective. The first part of the string that <code>%a</code> produces is in fact a fraction composed by hexadecimal digits, so that an output like <code>0x1.4p+3</code> may take some time to be parsed as <code>10</code> by a human reader. An option could be to print all the decimal digits needed to represent the floating-point value, but there may be a lot of them. Consider, for example the value 0.1, its closest representation as a 64-bit float may be <pre class="prettyprint"><code>0x1.999999999999ap-4 == 0.1000000000000000055511151231257827021181583404541015625 </code></pre> While <code>printf("%.*lf\n", DBL_DECIMAL_DIG, 01);</code> (see e.g. Eric's answer) would print <pre class="prettyprint"><code>0.10000000000000001 // If DBL_DECIMAL_DIG == 17 </code></pre> My proposal is somewhere in the middle. Similarly to what <code>%a</code> does, we can exactly represent any floating-point value with radix 2 as a fraction multiplied by 2 raised to some integer power. We can transform that fraction into a whole number (increasing the exponent accordingly) and print it as a decimal value. <pre class="prettyprint"> 0x1.999999999999ap-4 --> 1.999999999999a16 * 2-4 --> 1999999999999a16 * 2-56 --> 720575940379279410 * 2-56</pre> That whole number has a limited number of digits (it's < 253), but the result it's still an exact representation of the original <code>double</code> value. The following snippet is a proof of concept, without any check for corner cases. The format specifier <code>%a</code> separates the mantissa and the exponent with a <code>p</code> character (as in "... multiplied by two raised to the Power of..."), I'll use a <code>q</code> instead, for no particular reason other than using a different symbol. The value of the mantissa will also be reduced (and the exponent raised accordingly), removing all the trailing zero-bits. The idea beeing that <code>5q+1</code> (parsed as 510 * 21) should be more "easily" identified as <code>10</code>, rather than <code>2814749767106560q-48</code>. <pre class="prettyprint"><code>#include <math.h> #include <stdio.h> #include <stdlib.h> #include <string.h> void to_my_format(double x, char *str) { int exponent; double mantissa = frexp(x, &exponent); long long m = 0; if ( mantissa ) { exponent -= 52; m = (long long)scalbn(mantissa, 52); // A reduced mantissa should be more readable while (m && m % 2 == 0) { ++exponent; m /= 2; } } sprintf(str, "%lldq%+d", m, exponent); // ^ // Here 'q' is used to separate the mantissa from the exponent } double from_my_format(char const *str) { char *end; long long mantissa = strtoll(str, &end, 10); long exponent = strtol(str + (end - str + 1), &end, 10); return scalbn(mantissa, exponent); } int main(void) { double tests[] = { 1, 0.5, 2, 10, -256, acos(-1), 1000000, 0.1, 0.125 }; size_t n = (sizeof tests) / (sizeof *tests); char num[32]; for ( size_t i = 0; i < n; ++i ) { to_my_format(tests[i], num); double x = from_my_format(num); printf("%22s%22a ", num, tests[i]); if ( tests[i] != x ) printf(" *** %22a *** Round-trip failed\n", x); else printf("%58.55g\n", x); } return 0; } </code></pre> Testable here. Generally, the improvement in readability is admitedly little to none, surely a matter of opinion.

How do I print a floating-point value for later scanning with perfect accuracy?

Tags:

c

floating-point

precision

numeric

Suppose I have a floating-point value of type float or double (i.e. 32 or 64 bits on typical machines). I want to print this value as text (e.g. to the standard output stream), and then later, in some other process, scan it back in - with fscanf() if I'm using C, or perhaps with istream::operator>>() if I'm using C++. But - I need the scanned float to end up being exactly, identical to the original value (up to equivalent representations of the same value). Also, the printed value should be easily readable - to a human - as floating-point, i.e. I don't want to print 0x42355316 and reinterpret that as a 32-bit float.

How should I do this? I'm assuming the standard library of (C and C++) won't be sufficient, but perhaps I'm wrong. I suppose that a sufficient number of decimal digits might be able to guarantee an error that's underneath the precision threshold - but that's not the same as guaranteeing the rounding/truncation will happen just the way I want it.

Notes:

The scanning does not having to be perfectly accurate w.r.t. the value it scans, only the original value.
If it makes it easier, you may assume the value is a number and is not infinity.
denormal support is desired but not required; still if we get a denormal, failure should be conspicuous.

495

asked Jul 19 '20 15:07

einpoklum

2 Answers

First, you should use the %a format with fprintf and fscanf. This is what it was designed for, and the C standard requires it to work (reproduce the original number) if the implementation uses binary floating-point.

Failing that, you should print a float with at least FLT_DECIMAL_DIG significant digits and a double with at least DBL_DECIMAL_DIG significant digits. Those constants are defined in <float.h> and are defined:

… number of decimal digits, n, such that any floating-point number with p radix b digits can be rounded to a floating-point number with n decimal digits and back again without change to the value,… [b is the base used for the floating-point format, defined in FLT_RADIX, and p is the number of base-b digits in the format.]

For example:

    printf("%.*g\n", FLT_DECIMAL_DIG, 1.f/3);

or:

#define QuoteHelper(x)  #x
#define Quote(x)        QuoteHelper(x)
…
    printf("%." Quote(FLT_DECIMAL_DIG) "g\n", 1.f/3);

In C++, these constants are defined in <limits> as std::numeric_limits<Type>::max_digits10, where Type is float or double or another floating-point type.

Note that the C standard only recommends that such a round-trip through a decimal numeral work; it does not require it. For example, C 2018 5.2.4.2.2 15 says, under the heading “Recommended practice”:

Conversion from (at least) double to decimal with DECIMAL_DIG digits and back should be the identity function. [DECIMAL_DIG is the equivalent of FLT_DECIMAL_DIG or DBL_DECIMAL_DIG for the widest floating-point format supported in the implementation.]

In contrast, if you use %a, and FLT_RADIX is a power of two (meaning the implementation uses a floating-point base that is two, 16, or another power of two), then C standard requires that the result of scanning the numeral produced with %a equals the original number.

answered Nov 03 '22 11:11

Eric Postpischil

I need the scanned float to end up being exactly, identical to the original value.

As already pointed out in the other answers, that can be achieved with the %a format specifier.

Also, the printed value should be easily readable - to a human - as floating-point, i.e. I don't want to print 0x42355316 and reinterpret that as a 32-bit float.

That's more tricky and subjective. The first part of the string that %a produces is in fact a fraction composed by hexadecimal digits, so that an output like 0x1.4p+3 may take some time to be parsed as 10 by a human reader.

An option could be to print all the decimal digits needed to represent the floating-point value, but there may be a lot of them. Consider, for example the value 0.1, its closest representation as a 64-bit float may be

0x1.999999999999ap-4  ==  0.1000000000000000055511151231257827021181583404541015625

While printf("%.*lf\n", DBL_DECIMAL_DIG, 01); (see e.g. Eric's answer) would print

0.10000000000000001   // If DBL_DECIMAL_DIG == 17

My proposal is somewhere in the middle. Similarly to what %a does, we can exactly represent any floating-point value with radix 2 as a fraction multiplied by 2 raised to some integer power. We can transform that fraction into a whole number (increasing the exponent accordingly) and print it as a decimal value.

0x1.999999999999ap-4 --> 1.999999999999a₁₆ * 2^-4  --> 1999999999999a₁₆ * 2^-56 
                     --> 7205759403792794₁₀ * 2^-56

That whole number has a limited number of digits (it's < 2⁵³), but the result it's still an exact representation of the original double value.

The following snippet is a proof of concept, without any check for corner cases. The format specifier %a separates the mantissa and the exponent with a p character (as in "... multiplied by two raised to the Power of..."), I'll use a q instead, for no particular reason other than using a different symbol.

The value of the mantissa will also be reduced (and the exponent raised accordingly), removing all the trailing zero-bits. The idea beeing that 5q+1 (parsed as 5₁₀ * 2¹) should be more "easily" identified as 10, rather than 2814749767106560q-48.

#include <math.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

void to_my_format(double x, char *str)
{
    int exponent;
    double mantissa = frexp(x, &exponent);
    long long m = 0;
    if ( mantissa ) {
        exponent -= 52;
        m = (long long)scalbn(mantissa, 52);
        // A reduced mantissa should be more readable
        while (m  &&  m % 2 == 0) {
            ++exponent;
            m /= 2;
        }
    }
    sprintf(str, "%lldq%+d", m, exponent);
    //                ^
    // Here 'q' is used to separate the mantissa from the exponent  
}

double from_my_format(char const *str)
{
    char *end;
    long long mantissa = strtoll(str, &end, 10);
    long exponent = strtol(str + (end - str + 1), &end, 10);
    return scalbn(mantissa, exponent);
}

int main(void)
{
    double tests[] = { 1, 0.5, 2, 10, -256, acos(-1), 1000000, 0.1, 0.125 };
    size_t n = (sizeof tests) / (sizeof *tests);
    
    char num[32];
    for ( size_t i = 0; i < n; ++i ) {
        to_my_format(tests[i], num);
        double x = from_my_format(num);
        printf("%22s%22a ", num, tests[i]);
        if ( tests[i] != x )
            printf(" *** %22a *** Round-trip failed\n", x);
        else
            printf("%58.55g\n", x);
    }
    return 0;
}

Testable here.

Generally, the improvement in readability is admitedly little to none, surely a matter of opinion.

answered Nov 03 '22 12:11

Bob__

Related questions
                            
                                How to convert 'ls' command to 'cat' command?
                            
                                Linux, UDP datagrams, and kernel timestamps: Lots of examples and stackoversflow entries later, and still cannot get timestamps at all
                            
                                Are C standard library structures compatible between compilers and library versions on macOS or Linux?
                            
                                Function pointer as a const argument
                            
                                OCaml dynamically check for badly behaved native functions
                            
                                What do i64 and i32 at the end of the values in limits.h mean?
                            
                                What exactly is Datum in PostgreSQL C Language functions?
                            
                                Accessing two discontinuous memory blocks as a single continuous block, in C?
                            
                                What do f_bsize and f_frsize in struct statvfs stand for?
                            
                                How to get execution time of c program?
                            
                                Subtracting NULL pointer from a normal pointer generates arithmetic right shift
                            
                                How to get the gcc compiler to not optimize a standard library function call like printf?
                            
                                Draw border (frame) using xlib
                            
                                Inconsistent C99 support in gcc and clang
                            
                                Reliable type-punning across C and C++ standards
                            
                                When returning the difference between pointers of char strings, how important is the order of casting and dereferencing?
                            
                                How to distinguish armhf (ARMv7) and armel (ARMv4) in C code?
                            
                                Where should function attributes go?
                            
                                How does this macro detect alignment issues?
                            
                                When should I use hypot over sqrtl?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With