Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does 1..99,999 == "1".."99,999" in R, but 100,000 != "100,000"?

In the console, go ahead and try

> sum(sapply(1:99999, function(x) { x != as.character(x) }))
0

For all of values 1 through 99999, "1" == 1, "2" == 2, ..., 99999 == "99999" are TRUE. However,

> 100000 == "100000"
FALSE

Why does R have this quirky behavior, and is this a bug? What would be a workaround to, e.g., check if every element in an atomic character vector is in fact numeric? Right now I was trying to check whether x == as.numeric(x) for each x, but that fails on certain datasets due to the above problem!

like image 978
Robert Krzyzanowski Avatar asked Sep 23 '13 16:09

Robert Krzyzanowski


1 Answers

Have a look at as.character(100000). Its value is not equal to "100000" (have a look for yourself), and R is essentially just telling you so.

as.character(100000)
# [1] "1e+05"

Here, from ?Comparison, are R's rules for applying relational operators to values of different types:

If the two arguments are atomic vectors of different types, one is coerced to the type of the other, the (decreasing) order of precedence being character, complex, numeric, integer, logical and raw.

Those rules mean that when you test whether 1=="1", say, R first converts the numeric value on the LHS to a character string, and then tests for equality of the character strings on the LHS and RHS. In some cases those will be equal, but in other cases they will not. Which cases produce inequality will be dependent on the current settings of options("scipen") and options("digits")

So, when you type 100000=="100000", it is as if you were actually performing the following test. (Note that internally, R may well/probably does use something different than as.character() to perform the conversion):

as.character(100000)=="100000"
# [1] FALSE
like image 115
Josh O'Brien Avatar answered Nov 01 '22 10:11

Josh O'Brien