In R Language Definition, NA
values are briefly described, a portion of which says
... In particular,
FALSE & NA
isFALSE
,TRUE | NA
isTRUE
.NA
is not equal to any other value or to itself; testing forNA
is done usingis.na
. However, anNA
value will match anotherNA
value inmatch
.
Regarding the statement "NA
is not equal to
any other value or to itself",
Updated: The question, revised again, is
What is the reasoning, if any, behind NA
matching NA
in match
, and nowhere else in the language?
It doesn't make sense to me that a missing value, unknown by anyone (or it would not be missing), would match another missing value of the same type. Since I posted this, I came across something in example(match)
that provides some reasoning. Character coercion changes its type. I can erase it completely if I like.
match(NA, NA)
# [1] 1
match(NA, NA_real_)
# [1] 1
match(NA_character_, NA_real_)
# [1] 1
match(paste(NA), NA)
# [1] NA
gsub("NA", "", NA)
# [1] NA
gsub("NA", "", paste(NA))
# [1] ""
is.na(NA)
# [1] TRUE
is.na(paste(NA))
# [1] FALSE
Apologies for stirring the pot, but some of the documentation is unclear about this. It might boil down to the R parser/deparser and the fact that you can turn anything into a text character object in R.
Original Post:
Now referring to "However, an NA
value will match another NA
value in match
."
If NA
is it not equal to itself, why is it matched with itself in match
? and also in identical
? Is this done on purpose?
NA == NA ## expecting TRUE
# [1] NA
NA != NA
# [1] NA
x <- NA
x == x
# [1] NA
match(NA, NA)
# [1] 1
identical(NA, NA)
# [1] TRUE
all.equal(NA, NA)
# [1] TRUE
It's a matter of convention. There are good reasons for the way ==
works. NA
is a special value in R that is supposed to represent data that is missing and should be treated differently from the rest of data. There are innumerable very subtle bugs that could come up if we started comparing missing values as if they were known or as if two missing values were equal to each other.
Think of NA
as meaning "I don't know what's there". The correct answer to 3 > NA
is obviously NA
because we don't know if the missing value is larger than 3 or not. Well, it's the same for NA == NA
. They are both missing values but the true values could be quite different, so the correct answer is "I don't know."
R doesn't know what you are doing in your analysis, so instead of potentially introducing bugs that would later end up being published and embarrassing you, it doesn't allow comparison operators to think NA is a value.
match()
was written with a more specific purpose in mind: finding the indexes of matching values. If you ask the question "Should I match 3 with NA", a reasonable answer is "no." Different (and very useful) convention, and justified because R pretty much knows what you are trying to do when you invoke match()
. Now, should we match NA
with NA
for this purpose? It could be argued.
Come to think of it, I suppose it is a a little odd that the authors of match()
chose to allow NA
to match to itself by default. You can imagine cases where you might use match()
to find NA
rows in table
along with other values, but it's dangerous. You just have to be a bit more careful about knowing whether you have any NA values in x and only permitting them if you really wanted to. You can change this behavior by specifying incomparables=NA
when calling match()
.
To add to @farnsy's great answer, and to elaborate on the difference with ==
and match
:
The key thing to consider is how these two functions (==
and match
) are used.
x == y
translation: Is the value on the left the same value as the one on the right
match(x, table)
translation: Is the value on the left found in the table on the right;
if so, return the index of the FIRST TIME that x appears in table
A common use case I often encounter is working with a set of IDs. Especially, when dealing with two different datasets that have been joined, I might be left with several NAs in one of my ID columns
However, not all NAs represent the same real life object.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With