I am working with an array of doubles called indata
(in the heap, allocated with malloc), and a local double called sum
.
I wrote two different functions to compare values in indata
, and obtained different results. Eventually I determined that the discrepancy was due to one function using an expression in a conditional test, and the other function using a local variable in the same conditional test. I expected these to be equivalent.
My function A uses:
if (indata[i]+indata[j] > max) hi++;
and my function B uses:
sum = indata[i]+indata[j];
if (sum>max) hi++;
After going through the same data set and max
, I end up with different values of hi
depending on which function I use. I believe function B is correct, and function A is misleading. Similarly when I try the snippet below
sum = indata[i]+indata[j];
if ((indata[i]+indata[j]) != sum) etc.
that conditional will evaluate to true.
While I understand that floating point numbers do not necessarily provide an exact representation, why does that in-exact representation change when evaluated as an expression vs stored in a variable? Is recommended best practice to always evaluate a double expression like this prior to a conditional? Thanks!
A double type variable is a 64-bit floating data type C, C++, C# and many other programming languages recognize the double as a type. A double type can represent fractional as well as whole values. It can contain up to 15 digits in total, including those before and after the decimal point.
To store double, computer will allocate 8 byte (64 bit) memory.
The double and long double are two data types used in programming languages such as C++. The main difference between double and long double is that double is used to represent a double precision floating point while long precision is used to represent extended precision floating point value.
I suspect you're using 32-bit x86, the only common architecture subject to excess precision. In C, expressions of type float
and double
are actually evaluated as float_t
or double_t
, whose relationships to float
and double
are reflected in the FLT_EVAL_METHOD
macro. In the case of x86, both are defined as long double
because the fpu is not actually capable of performing arithmetic at single or double precision. (It has mode bits intended to allow that, but the behavior is slightly wrong and thus can't be used.)
Assigning to an object of type float
or double
is one way to force rounding and get rid of the excess precision, but you can also just add a gratuitous cast to (double)
if you prefer to leave it as an expression without assignments.
Note that forcing rounding to the desired precision is not equivalent to performing the arithmetic at the desired precision; instead of one rounding step (during the arithmetic) you now have two (during the arithmetic, and again to drop unwanted precision), and in cases where the first rounding gives you an exact-midpoint, the second rounding can go in the 'wrong' direction. This issue is generally called double rounding, and it makes excess precision significantly worse than nominal precision for certain types of calculations.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With