Computing floating point accuracy (K&R 2-1)

Tags:

I found Stevens Computing Services – K & R Exercise 2-1 a very thorough answer to K&R 2-1. This slice of the full code computes the maximum value of a float type in the C programming language.

Unluckily my theoretical comprehension of float values is quite limited. I know they are composed of significand (mantissa.. ) and a magnitude which is a power of 2.

#include <stdio.h>
#include <limits.h>
#include <float.h>

main()
{
    float flt_a, flt_b, flt_c, flt_r;

    /* FLOAT */
    printf("\nFLOAT MAX\n");
    printf("<limits.h> %E  ", FLT_MAX);

    flt_a = 2.0;
    flt_b = 1.0;
    while (flt_a != flt_b) {
        flt_m = flt_b;           /* MAX POWER OF 2 IN MANTISSA */     
        flt_a = flt_b = flt_b * 2.0;
        flt_a = flt_a + 1.0;
    }
    flt_m = flt_m + (flt_m - 1); /* MAX VALUE OF MANTISSA */

    flt_a = flt_b = flt_c = flt_m;
    while (flt_b == flt_c) {
        flt_c = flt_a;        
        flt_a = flt_a * 2.0;
        flt_b = flt_a / 2.0;
    }
    printf("COMPUTED %E\n", flt_c);
}

I understand that the latter part basically checks to which power of 2 it's possible to raise the significand with a three variable algorithm. What about the first part?

I can see that a progression of multiples of 2 should eventually determine the value of the significand, but I tried to trace a few small numbers to check how it should work and it failed to find the right values...

======================================================================

What are the concepts on which this program is based upon and does this program gets more precise as longer and non-integer numbers have to be found?

824

asked Feb 09 '15 19:02

maja

1 Answers

The first loop determines the number of bits contributing to the significand by finding the least power 2 such that adding 1 to it (using floating-point arithmetic) fails to change its value. If that's the nth power of two, then the significand uses n bits, because with n bits you can express all the integers from 0 through 2^n - 1, but not 2^n. The floating-point representation of 2^n must therefore have an exponent large enough that the (binary) units digit is not significant.

By that same token, having found the first power of 2 whose float representation has worse than unit precision, the maximim float value that does have unit precision is one less. That value is recorded in variable flt_m.

The second loop then tests for the maximum exponent by starting with the maximum unit-precision value, and repeatedly doubling it (thereby increasing the exponent by 1) until it finds that the result cannot be converted back by halving it. The maximum float is the value before that final doubling.

Do note, by the way, that all the above supposes a base-2 floating-point representation. You are unlikely to run into anything different, but C does not actually require any specific representation.

With respect to the second part of your question,

does this program gets more precise as longer and non-integer numbers have to be found?

the program takes care to avoid losing precision. It does assume a binary floating-point representation such as you described, but it will work correctly regardless of the number of bits in the significand or exponent of such a representation. No non-integers are involved, but the program already deals with numbers that have worse than unit precision, and with numbers larger than can be represented with type int.

answered Oct 11 '22 16:10

John Bollinger

Related questions
                            
                                Using mergesort with presorted intervals
                            
                                Getting file descriptors and details within kernel space without open()
                            
                                Why are there 8 bytes between the end of a buffer and the saved frame pointer?
                            
                                Understanding Stack Frames in C
                            
                                malloc like function using custom heap
                            
                                Replacing the close() function in Linux with my own close() function
                            
                                Why is the time of computation of a matrix multiplication not constant?
                            
                                How to accept SSL connection in one process and reuse the same SSL context in another process
                            
                                Kdevelop steps and breakpoints not working
                            
                                how can I check a particular gcc feature in configure.ac
                            
                                How do signals interact with sequence points?
                            
                                Can I implement a fair "wait on multiple events" with just events, mutexes, and semaphores?
                            
                                MinGW GCC in Windows 7 x64 does not create an executable
                            
                                fwrite() adds garbage data to output (WINE & Windows 7, mingw & MSVC; NOT linux/gcc)
                            
                                C pointers to Matlab variables
                            
                                Is it possible to use memory barriers only on the storing side
                            
                                What exactly needs to be PROTECTed when writing C functions for use in R
                            
                                Native android development without any java?
                            
                                Can I redirect output from a C DLL into my c# log4net output
                            
                                Which addrinfo struct should be used in connect()?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Computing floating point accuracy (K&R 2-1)

Tags:

c

floating-point

floating-accuracy

kernighan-and-ritchie

maja

People also ask

1 Answers

John Bollinger

Recent Activity

Donate For Us