Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Rounding issue in all.equal

Tags:

r

I am reading through The R Inferno, and have run into something I do not understand. In addition to section 8.2.23 in the Inferno, there have been some good questions on comparing floating point numbers: question1, question2.

However, I am still running into a problem using all.equal. Using the default all.equal I get the results (mostly) as I would expect.

> all.equal(2,1.99999997)
[1] "Mean relative difference: 1.5e-08"
> all.equal(2,1.99999998) #I expected FALSE here
[1] TRUE
> all.equal(2,1.99999999)
[1] TRUE

I am not sure why at 1.99999998 the function returns TRUE, but that is not as concerning as the following behavior where I specified the tolerance level:

> all.equal(2,1.98,tolerance=0.01) #Behaves as expected
[1] "Mean relative difference: 0.01"
> all.equal(2,1.981,tolerance=0.01) #Does not behave as expected
[1] TRUE

Furthermore,

> all.equal(2,1.980000000001,tolerance=0.01)
[1] TRUE 

But if we compute:

> diff(c(1.981,2))
[1] 0.019

and clearly,

> diff(c(1.981,2)) >= 0.01
[1] TRUE

So, why is all.equal unable to distinguish 2 and 1.981 with a tolerance of 0.01?

EDIT

From the documentation: Numerical comparisons for scale = NULL (the default) are done by first computing the mean absolute difference of the two numerical vectors. If this is smaller than tolerance or not finite, absolute differences are used, otherwise relative differences scaled by the mean absolute difference.

Here I do not understand the behavior. I can see that diff(1.981,2) is not finite:

> sprintf("%.25f",diff(c(1.981,2)))
[1] "0.0189999999999999058530875"

But then what does it get scaled by? When each vector is of length one, the mean absolute difference should equal the difference of the two numbers, and dividing by the mean absolute difference would give 1. Clearly, I am understanding the logic here wrong.

like image 665
dayne Avatar asked Sep 13 '13 19:09

dayne


2 Answers

This has to do with floating point accuracy. The manual isn't entirely clear at first glance, but in your example the mean absolute difference of 2-1.981 is 0.019 which is > 0.01, the tolerance. scale is also NULL. Therefore the comparison made is the relative difference scaled by the mean absolute difference. Eh?!

Using tolerance implies that you care about the magnitude of the numbers involved. Relative difference accounts for not how big the difference is (absolute terms), but how great it is, relative to the numbers being compared. Given the example in the link, the difference between 5 and 6 is more significant (I use the term loosely) than between 1,000,000,000 and 1,000,000,001.

So if the relative difference between the two numbers is less than tolerance the numbers are considered equal. For two single numbers (as in this example) the relative difference is given by:

( current - target ) / current

Which is

( 2 - 1.981 ) / 2 == 0.0095

The tolerance you specified is 0.01 therefore the numbers are considered equal because the relative difference is less than this. The difference between these numbers ± the relative difference also just happens to be the smallest representable floating point number!

identical( abs( ( 2 - 0.0095 ) - ( 1.981 + 0.0095 ) ) , .Machine$double.eps )
[1] TRUE

Now try:

all.equal( 2 , 1.981 , 0.00949999999999 )
[1] "Mean relative difference: 0.0095"
like image 109
Simon O'Hanlon Avatar answered Nov 18 '22 06:11

Simon O'Hanlon


This happens because in this case all.equal checks relative differences. If you set scale=1, i.e. no scaling, absolute comparisons will be made and all.equal behaves as you are expecting.

For further details see the documentation on the scale parameter.

> all.equal(2,1.980000000001,tolerance=0.01)
[1] TRUE
> all.equal(2,1.980000000001,tolerance=0.01,scale=1)
[1] "Mean scaled difference: 0.02"
like image 6
ROLO Avatar answered Nov 18 '22 08:11

ROLO