How can a fp with higher mantissa represent a smaller number?

Question

I love FP; every time I think to had got it, I understand to know nothing about it :)

This is an example that I don't understand. I sum 8 time the same number (0.1) and I print the result, of both sum and "original":

std::cout.precision(100);

int numIteration = 8;
double step = 0.1;
double sum = 0.0;

for(int i = 0; i < numIteration; i++) {
    sum += step;
}

std::cout << "orig stored as " << numIteration / 10.0 << std::endl;
std::cout << " sum stored as " << sum << std::endl;

0.1 is stored as 0.1000000000000000055511151231257827021181583404541015625, and I expect that after 8 sum, it will be stored bigger or equal of 0.8, which is stored as 0.8000000000000000444089209850062616169452667236328125.

But the result shock me. In fact after 8 sum, the result is 0.79999999999999993338661852249060757458209991455078125, which is smaller.

Also, if I check the binary output of both, I can see that the sum is "higher" than the "original":

0.8 stored as binary 0 01111111110 1001100110011001100110011001100110011001100110011001 // smaller
sum stored as binary 0 01111111110 1001100110011001100110011001100110011001100110011010 // higher

But 0.79999999999999993338661852249060757458209991455078125 < 0.8000000000000000444089209850062616169452667236328125.

Can you shine me?

EDIT: sorry to everybody, I had a mistake on copy/paste the binary. They were correct.

AMA · Accepted Answer

With IEEE floating-point rounding happens after every arithmetic operation. And the rounding could go up or down. If you print the value of sum at every iteration you should see:

sum is 0.1000000000000000055511151231257827021181583404541015625
sum is 0.200000000000000011102230246251565404236316680908203125
sum is 0.3000000000000000444089209850062616169452667236328125
sum is 0.40000000000000002220446049250313080847263336181640625
sum is 0.5
sum is 0.59999999999999997779553950749686919152736663818359375
sum is 0.6999999999999999555910790149937383830547332763671875
sum is 0.79999999999999993338661852249060757458209991455078125

You assume that the rounding could only go up. But, since "Round to nearest, ties to even" is the default rounding-mode in IEEE 754, the nearest binary-representable value is picked at every iteration, therefore the result is not necessary larger than 0.8.

On the other hand

std::cout << 0.1 * 8.0 << std::endl;

Will produce the expected

0.8000000000000000444089209850062616169452667236328125

Update: as @Evg mentioned in the comment, floating-point rounding direction can be changed with std::fesetround.

Evg · Answer

Your binary representations are wrong. The correct ones are:

sum = 0.79999999999999993 ... = 
0b0011111111101001100110011001100110011001100110011001100110011001

numIteration / 10.0 = 0.80000000000000004... = 
0b0011111111101001100110011001100110011001100110011001100110011010

Martin Bonner supports Monica · Answer

In general, there is a problem when you add a small increment to a large sum. There is not enough precision to store the full result, and some significance is lost. By the last iteration of the loop you have started to run into this.

For a sufficiently large sum and small increment, the sum may not change at all.

Arne Vogel · Answer

While AMA's answer is correct in that rounding happens after every addition, the same surprises can happen even for just a single operation (including multiplication):

#include <iostream>

int main()
{
     const auto val1 = 0.3444444444444444
              , val2 = 0.34444444444444442;
     std::cout << (2*val1) << '
'
               << (2*val2) << '
';
}

(Unless otherwise mentioned, I assume IEEE doubles with standard rounding behavior.)

The first line will show 0.6888888888888888 (if you trust me to do the counting for you, it's 15x 4 in the input, and 15x 8 in the output) with no surprises. We'd assume that the second line shows either an additional digit, hopefully somewhat near 4, or that the result is unchanged.

In reality, however, the second line will show 0.6888888888888889. That's a surprise, how can a 4 on the last digit be rounded up on the next higher digit? This contradicts our notion that inequalities are maintained when a positive scaling factor is applied on both sides. I.e. since 2 < 2.5, then 2*2 < 2*2.5, then 4<5. This means, since it would take a last digit of 5 to round up (in the decimal system) in 2*val2, that val2 would intuitively have to be at least 0.344444444444444425 for upward rounding.

The problem here is that every number system has different rounding of inputs and outputs. In fact, no rounding even occurs in binary as a result of the multiplication itself, however rounding occurs in both number system conversions. Binary representations of the inputs:

0.01011000001011011000001011011000001011011000001011001 (val1) 0.01011000001011011000001011011000001011011000001011011 (val2)

A multiplication by 2 is just a left shift by 1, of course, in binary, which includes floating point (at least if we ignore the possibility of overflow), so the outputs are:

0.10110000010110110000010110110000010110110000010110010 (2*val1) 0.10110000010110110000010110110000010110110000010110110 (2*val2)

The latter converts back to 0.68888888888888888395… (note there is an additional 8 now), which is correctly rounded to 0.68888888888888889.

In this particular case, the original cause of the surprising behavior is that val2 actually becomes:

0.3444444444444444419772821675

also with an additional 4 which replaces the trailing 2 we entered, and that, when doubled, causes upward rounding to occur in decimal.

How can a fp with higher mantissa represent a smaller number?

Tags:

c++

floating-point

markzzz

4 Answers

AMA

Evg

Martin Bonner supports Monica

Arne Vogel

Recent Activity

Donate For Us

How can a fp with higher mantissa represent a smaller number?

Tags:

c++

floating-point

markzzz

4 Answers

AMA

Evg

Martin Bonner supports Monica

Arne Vogel

Related questions

Recent Activity

Donate For Us