Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C fundamentals: double variable not equal to double expression?

I am working with an array of doubles called indata (in the heap, allocated with malloc), and a local double called sum.

I wrote two different functions to compare values in indata, and obtained different results. Eventually I determined that the discrepancy was due to one function using an expression in a conditional test, and the other function using a local variable in the same conditional test. I expected these to be equivalent.

My function A uses:

    if (indata[i]+indata[j] > max) hi++;

and my function B uses:

    sum = indata[i]+indata[j];
    if (sum>max) hi++;

After going through the same data set and max, I end up with different values of hi depending on which function I use. I believe function B is correct, and function A is misleading. Similarly when I try the snippet below

    sum = indata[i]+indata[j];
    if ((indata[i]+indata[j]) != sum) etc.

that conditional will evaluate to true.

While I understand that floating point numbers do not necessarily provide an exact representation, why does that in-exact representation change when evaluated as an expression vs stored in a variable? Is recommended best practice to always evaluate a double expression like this prior to a conditional? Thanks!

like image 226
stilllearning Avatar asked Jun 04 '16 05:06

stilllearning


People also ask

What is double variable in C?

A double type variable is a 64-bit floating data type C, C++, C# and many other programming languages recognize the double as a type. A double type can represent fractional as well as whole values. It can contain up to 15 digits in total, including those before and after the decimal point.

How are doubles stored in C?

To store double, computer will allocate 8 byte (64 bit) memory.

What is double and long double in C?

The double and long double are two data types used in programming languages such as C++. The main difference between double and long double is that double is used to represent a double precision floating point while long precision is used to represent extended precision floating point value.


1 Answers

I suspect you're using 32-bit x86, the only common architecture subject to excess precision. In C, expressions of type float and double are actually evaluated as float_t or double_t, whose relationships to float and double are reflected in the FLT_EVAL_METHOD macro. In the case of x86, both are defined as long double because the fpu is not actually capable of performing arithmetic at single or double precision. (It has mode bits intended to allow that, but the behavior is slightly wrong and thus can't be used.)

Assigning to an object of type float or double is one way to force rounding and get rid of the excess precision, but you can also just add a gratuitous cast to (double) if you prefer to leave it as an expression without assignments.

Note that forcing rounding to the desired precision is not equivalent to performing the arithmetic at the desired precision; instead of one rounding step (during the arithmetic) you now have two (during the arithmetic, and again to drop unwanted precision), and in cases where the first rounding gives you an exact-midpoint, the second rounding can go in the 'wrong' direction. This issue is generally called double rounding, and it makes excess precision significantly worse than nominal precision for certain types of calculations.

like image 165
R.. GitHub STOP HELPING ICE Avatar answered Nov 06 '22 05:11

R.. GitHub STOP HELPING ICE