Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Nonintuitive result of the assignment of a double precision number to an int variable in C

Could someone give me an explanation why I get two different numbers, resp. 14 and 15, as an output from the following code?

#include <stdio.h>    int main() {     double Vmax = 2.9;      double Vmin = 1.4;      double step = 0.1;       double a =(Vmax-Vmin)/step;     int b = (Vmax-Vmin)/step;     int c = a;      printf("%d  %d",b,c);  // 14 15, why?     return 0; } 

I expect to get 15 in both cases but it seems I'm missing some fundamentals of the language.

I am not sure if it's relevant but I was doing the test in CodeBlocks. However, if I type the same lines of code in some on-line compiler ( this one for example) I get an answer of 15 for the two printed variables.

like image 254
GeorgiD Avatar asked Feb 27 '18 15:02

GeorgiD


People also ask

What is a double precision variable?

Double precision is an inexact, variable-precision numeric type. In other words, some values cannot be represented exactly and are stored as approximations. Thus, input and output operations involving double precision might show slight discrepancies.

What is the value of the accuracy of a double precision variable?

Double precision provides greater range (approximately 10**(-308) to 10**308) and precision (about 15 decimal digits) than single precision (approximate range 10**(-38) to 10**38, with about 7 decimal digits of precision).

What does double precision mean?

Double precision means the numbers takes twice the word-length to store. On a 32-bit processor, the words are all 32 bits, so doubles are 64 bits.

What does double precision mean in C++?

The C++ double should have a floating-point precision of up to 15 digits as it contains a precision that is twice the precision of the float data type. When you declare a variable as double, you should initialize it with a decimal value. For example, 3.0 is a decimal number.


1 Answers

... why I get two different numbers ...

Aside from the usual float-point issues, the computation paths to b and c are arrived in different ways. c is calculated by first saving the value as double a.

double a =(Vmax-Vmin)/step; int b = (Vmax-Vmin)/step; int c = a; 

C allows intermediate floating-point math to be computed using wider types. Check the value of FLT_EVAL_METHOD from <float.h>.

Except for assignment and cast (which remove all extra range and precision), ...

-1 indeterminable;

0 evaluate all operations and constants just to the range and precision of the type;

1 evaluate operations and constants of type float and double to the range and precision of the double type, evaluate long double operations and constants to the range and precision of the long double type;

2 evaluate all operations and constants to the range and precision of the long double type.

C11dr §5.2.4.2.2 9

OP reported 2

By saving the quotient in double a = (Vmax-Vmin)/step;, precision is forced to double whereas int b = (Vmax-Vmin)/step; could compute as long double.

This subtle difference results from (Vmax-Vmin)/step (computed perhaps as long double) being saved as a double versus remaining a long double. One as 15 (or just above), and the other just under 15. int truncation amplifies this difference to 15 and 14.

On another compiler, the results may both have been the same due to FLT_EVAL_METHOD < 2 or other floating-point characteristics.


Conversion to int from a floating-point number is severe with numbers near a whole number. Often better to round() or lround(). The best solution is situation dependent.

like image 121
chux - Reinstate Monica Avatar answered Oct 18 '22 17:10

chux - Reinstate Monica