Rounding error using the floor function in C++

Question

I was asked what will be the output of the following code:

floor((0.7+0.6)*10);

It returns 12.

I know that the floating point representation does not allow to represent all numbers with infinite precision and that I should expect some discrepancies.

My questions are:

How should I know that this piece of code returns 12, not 13? Why is (0.7+0.6)*10 a bit less than 13, not a bit more?
When can I expect the floor function to work incorrectly and when it works correctly for sure?

Note: I'm not asking how floating representation looks like or why the output isn't exactly 13. I'd like to know how should I infer that (0.7+0.6)*10 is a bit less than 13.

Pascal Cuoq · Accepted Answer

How should I know that this piece of code returns 12, not 13? Why is (0.7+0.6)*10 a bit less than 13, not a bit more?

Assume that your compilation platform uses strictly the IEEE 754 standard formats and operations. Then, convert all the constants involved to binary, keeping 53 significant digits, and apply the basic operations, as defined in IEEE 754, by computing the mathematical result and rounding to 53 significant binary digits at each step. A computer does not need to be involved at any stage, but you can make your life easier by using C99's hexadecimal floating-point format for input and output.

When can I expect the floor function to work incorrectly and when it works correctly for sure?

floor() is exact for all positive arguments. It is working correctly in your example. The behavior that surprises you does not originate with floor and has nothing to do with floor. The surprising behavior starts with the fact that 6/10 and 7/10 are not representable exactly as binary floating-point values, and continues with the fact that since these values have long expansions, floating-point operations + and * can produce a slightly rounded result wrt the mathematical result you could expect from the arguments they are actually applied to. floor() is the only place in your code that does not involve approximation.

Example program to see what is happening:

#include <stdio.h>
#include <math.h>

int main(void) {
  printf("%a
%a
%a
%a
%a
",
         0.7,
         0.6,
         0.7 + 0.6,
         (0.7+0.6)*10,
         floor((0.7+0.6)*10));
}

Result:

0x1.6666666666666p-1
0x1.3333333333333p-1
0x1.4ccccccccccccp+0
0x1.9ffffffffffffp+3
0x1.8p+3

IEEE 754 double-precision is really defined with respect to binary, but for conciseness the significand is written in hexadecimal. The exponent after p represents a power of two. For instance the last two results are both of the form <number roughly halfway between 1 and 2>*2³.

0x1.8p+3 is 12. The next integer, 13, is 0x1.ap+3, but the computation does not quite reach that value, and so the behavior of floor() is to round down to 12.

Rounding error using the floor function in C++

Tags:

c++

floating-point

rounding

precision

user2738748

1 Answers

Pascal Cuoq

Recent Activity

Donate For Us

Rounding error using the floor function in C++

Tags:

c++

floating-point

rounding

precision

user2738748

1 Answers

Pascal Cuoq

Related questions

Recent Activity

Donate For Us