I am reading through The R Inferno, and have run into something I do not understand. In addition to section 8.2.23 in the Inferno, there have been some good questions on comparing floating point numbers: question1, question2.
However, I am still running into a problem using all.equal
. Using the default all.equal
I get the results (mostly) as I would expect.
> all.equal(2,1.99999997)
[1] "Mean relative difference: 1.5e-08"
> all.equal(2,1.99999998) #I expected FALSE here
[1] TRUE
> all.equal(2,1.99999999)
[1] TRUE
I am not sure why at 1.99999998 the function returns TRUE
, but that is not as concerning as the following behavior where I specified the tolerance level:
> all.equal(2,1.98,tolerance=0.01) #Behaves as expected
[1] "Mean relative difference: 0.01"
> all.equal(2,1.981,tolerance=0.01) #Does not behave as expected
[1] TRUE
Furthermore,
> all.equal(2,1.980000000001,tolerance=0.01)
[1] TRUE
But if we compute:
> diff(c(1.981,2))
[1] 0.019
and clearly,
> diff(c(1.981,2)) >= 0.01
[1] TRUE
So, why is all.equal
unable to distinguish 2 and 1.981 with a tolerance of 0.01?
EDIT
From the documentation: Numerical comparisons for scale = NULL (the default) are done by first computing the mean absolute difference of the two numerical vectors. If this is smaller than tolerance or not finite, absolute differences are used, otherwise relative differences scaled by the mean absolute difference.
Here I do not understand the behavior. I can see that diff(1.981,2)
is not finite:
> sprintf("%.25f",diff(c(1.981,2)))
[1] "0.0189999999999999058530875"
But then what does it get scaled by? When each vector is of length one, the mean absolute difference should equal the difference of the two numbers, and dividing by the mean absolute difference would give 1. Clearly, I am understanding the logic here wrong.
This has to do with floating point accuracy. The manual isn't entirely clear at first glance, but in your example the mean absolute difference
of 2-1.981
is 0.019
which is >
0.01
, the tolerance
. scale
is also NULL
. Therefore the comparison made is the relative difference scaled by the mean absolute difference. Eh?!
Using tolerance
implies that you care about the magnitude of the numbers involved. Relative difference accounts for not how big the difference is (absolute terms), but how great it is, relative to the numbers being compared. Given the example in the link, the difference between 5 and 6 is more significant (I use the term loosely) than between 1,000,000,000
and 1,000,000,001
.
So if the relative difference between the two numbers is less than tolerance
the numbers are considered equal. For two single numbers (as in this example) the relative difference is given by:
( current - target ) / current
Which is
( 2 - 1.981 ) / 2 == 0.0095
The tolerance you specified is 0.01
therefore the numbers are considered equal because the relative difference is less than this. The difference between these numbers ±
the relative difference also just happens to be the smallest representable floating point number!
identical( abs( ( 2 - 0.0095 ) - ( 1.981 + 0.0095 ) ) , .Machine$double.eps )
[1] TRUE
Now try:
all.equal( 2 , 1.981 , 0.00949999999999 )
[1] "Mean relative difference: 0.0095"
This happens because in this case all.equal
checks relative differences. If you set scale=1
, i.e. no scaling, absolute comparisons will be made and all.equal
behaves as you are expecting.
For further details see the documentation on the scale
parameter.
> all.equal(2,1.980000000001,tolerance=0.01)
[1] TRUE
> all.equal(2,1.980000000001,tolerance=0.01,scale=1)
[1] "Mean scaled difference: 0.02"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With