Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does the order affect the rounding when adding multiple doubles in C#

Consider the following C# code:

double result1 = 1.0 + 1.1 + 1.2;
double result2 = 1.2 + 1.0 + 1.1;

if (result1 == result2)
{
    ...
}

result1 should always equal result2 right? The thing is, it doesn't. result1 is 3.3 and result2 is 3.3000000000000003. The only difference is the order of the constants.

I know that doubles are implemented in such a way that rounding issues can occur. I'm aware that I can use decimals instead if I need absolute precision. Or that I can use Math.Round() in my if statement. I'm just a nerd who wants to understand what the C# compiler is doing. Can anyone tell me?

Edit:

Thanks to everyone who's so far suggested reading up on floating point arithmetic and/or talked about the inherent inaccuracy of how the CPU handles doubles. But I feel the main thrust of my question is still unanswered. Which is my fault for not phrasing it correctly. Let me put it like this:

Breaking down the above code, I would expect the following operations to be happening:

double r1 = 1.1 + 1.2;
double r2 = 1.0 + r1
double r3 = 1.0 + 1.1
double r4 = 1.2 + r3

Let's assume that each of the above additions had a rounding error (numbered e1..e4). So r1 contains rounding error e1, r2 includes rounding errors e1 + e2, r3 contains e3 and r4 contains e3 + e4.

Now, I don't know how exactly how the rounding errors happen but I would have expected e1+e2 to equal e3+e4. Clearly it doesn't, but that seems somehow wrong to me. Another thing is that when I run the above code, I don't get any rounding errors. That's what makes me think it's the C# compiler that's doing something weird rather than the CPU.

I know I'm asking a lot and maybe the best answer anyone can give is to go and do a PHD in CPU design, but I just thought I'd ask.

Edit 2

Looking at the IL from my original code sample, it's clear that it's the compiler not the CPU that's doing this:

.method private hidebysig static void Main(string[] args) cil managed
{
.entrypoint
    .maxstack 1
    .locals init (
        [0] float64 result1,
        [1] float64 result2)
    L_0000: nop 
    L_0001: ldc.r8 3.3
    L_000a: stloc.0 
    L_000b: ldc.r8 3.3000000000000003
    L_0014: stloc.1 
    L_0015: ret 
}

The compiler is adding up the numbers for me!

like image 485
d4nt Avatar asked Mar 30 '09 11:03

d4nt


2 Answers

I would have expected e1+e2 to equal e3+e4.

That's not entirely unlike expecting

 floor( 5/3 ) + floor( 2/3 + 1 )

to equal

 floor( 5/3 + 2/3 ) + floor( 1 )

except you're multiplying by 2^53 before taking the floor.

Using 12 bit precision floating point and truncation with your values:

1.0            =  1.00000000000
1.1            =  1.00011001100
1.2            =  1.00110011001

1.0 + 1.1      = 10.00011001100 // extended during sum
r1 = 1.0 + 1.1 = 10.0001100110  // truncated to 12 bit
r1  + 1.2      = 11.01001100101 // extended during sum
r2 = r1  + 1.2 = 11.0100110010  // truncated to 12 bit

1.1 + 1.2      = 10.01001100110 // extended during sum
r3 = 1.1 + 1.2 = 10.0100110011  // truncated to 12 bit
r3 + 1.0       = 11.01001100110 // extended during sum
r4 = r3  + 1.0 = 11.0100110011  // truncated to 12 bit

So changing the order of operations/truncations causes the the error to change, and r4 != r2. If you add 1.1 and 1.2 in this system, the last bit carries, so in not lost on truncation. If you add 1.0 to 1.1, the last bit of 1.1 is lost and so the result is not the same.

In one ordering, the rounding (by truncation) removes a trailing 1.

In the other ordering, the rounding removes a trailing 0 both times.

One does not equal zero; so the errors are not the same.

Doubles have many more bits of precision, and C# probably uses rounding rather than truncation, but hopefully this simple model shows you different errors can happen with different orderings of the same values.

The difference between fp and maths is that + is shorthand for 'add then round' rather than just add.

like image 95
Pete Kirkham Avatar answered Sep 21 '22 04:09

Pete Kirkham


The c# compiler isn't doing anything. The CPU is.

if you have A in a CPU register, and you then add B, the result stored in that register is A+B, approximated to the floating precision used

If you then add C, the error adds up. This error addition is not a transitive operation, thus the final difference.

like image 24
Brann Avatar answered Sep 22 '22 04:09

Brann