Floating Point Arithmetic - Modulo Operator on Double Type

Question

So I'm trying to figure out why the modulo operator is returning such a large unusual value.

If I have the code:

double result = 1.0d % 0.1d;

it will give a result of 0.09999999999999995. I would expect a value of 0

Note this problem doesn't exist using the dividing operator - double result = 1.0d / 0.1d;

will give a result of 10.0, meaning that the remainder should be 0.

Let me be clear: I'm not surprised that an error exists, I'm surprised that the error is so darn large compared to the numbers at play. 0.0999 ~= 0.1 and 0.1 is on the same order of magnitude as 0.1d and only one order of magnitude away from 1.0d. Its not like you can compare it to a double.epsilon, or say "its equal if its < 0.00001 difference".

I've read up on this topic on StackOverflow, in the following posts one two three, amongst others.

Can anyone suggest explain why this error is so large? Any any suggestions to avoid running into the problems in the future (I know I could use decimal instead but I'm concerned about the performance of that).

Edit: I should specifically point out that I know that 0.1 is an infinitely repeating series of numbers in binary - does that have anything to do with it?

Chris Dodd · Accepted Answer

The error comes about because a double can't exactly represent 0.1 -- the closest it can represent is something like 0.100000000000000005551115123126. Now when you divide 1.0 by that it gives you a number slightly less than 10, but again a double can't exactly represent it, so it ends up rounding up to 10. But when you do the mod, it can give you that slightly less than 0.1 remainder.

since 0 = 0.1 mod 0.1, the actual error in the mod is 0.1 - 0.09999999... -- very small.

If you add the result of the % operator to 9 * 0.1, it will give you 1.0 again.

Edit

A bit more detail on the rounding stuff -- particularly as this problem is a good example of the perils of mixed precision.

The way a % b for floating point numbers is usually computed is as a - (b * floor(a/b)). The problem is that it may be done all at once with more internal precision than you'd get with those operations (and rounding the result to a fp number at each stage), so it might give you a different result. One example that a lot of people see is with the Intel x86/x87 hardware is using 80-bit precision for intermediate computations and only 64-bit precision for values in memory. So the value in b in the equation above is coming from memory and is thus a 64-bit fp number that's not quite 0.1 (thank dan04 for the exact value), so when it computes 1.0/0.1 it gets 9.99999999999999944488848768742172978818416595458984375 (rounded to 80 bits). Now if you round that to 64 bits, it would be 10.0, but if you keep the 80 bit internal and do the floor on it, it truncates to 9.0 and thus gets .0999999999999999500399638918679556809365749359130859375 as the final answer.

So in this case, you're seeing a large apparent error because you're using a noncontinuous step function (floor) which means that a very tiny difference in an internal value can push you over the step. But since mod is itself a noncontinuous step function thats to be expected and the real error here is 0.1-0.0999... as 0.1 is the discontinuous point in the range of the mod function.

Floating Point Arithmetic - Modulo Operator on Double Type

Tags:

c#

floating-point

precision

CrimsonX

1 Answers

Chris Dodd

Recent Activity

Donate For Us

Floating Point Arithmetic - Modulo Operator on Double Type

Tags:

c#

floating-point

precision

CrimsonX

1 Answers

Chris Dodd

Related questions

Recent Activity

Donate For Us