I just discovered the following behaviour of the is.na()
function which I don't understand:
df <- data.frame(a = 5:1, b = "text")
df
## a b
## 1 5 text
## 2 4 text
## 3 3 text
## 4 2 text
## 5 1 text
is.na(df)
## a b
## [1,] FALSE FALSE
## [2,] FALSE FALSE
## [3,] FALSE FALSE
## [4,] FALSE FALSE
## [5,] FALSE FALSE
is.na(df) <- "0"
df
## a b 0
## 1 5 text NA
## 2 4 text NA
## 3 3 text NA
## 4 2 text NA
## 5 1 text NA
My question
Why does is.na()
change its argument (and in this case adds an extra column to the data frame)? In this case its behaviour seems extra puzzling (or at least unexpected) because the result of the query is FALSE
for all instances.
NB
This question is not about subsetting and changing the NA
values in a data frame - I know how to do that (df[is.na(df)] <- "0"
). This question is about the behaviour of the is.na
function! Why is an assignment to a is.something
function changing the argument itself - this is unexpected.
The is.na() is a built-in R function that returns TRUE if it finds NA value and FALSE if it does not find in the dataset. If the value is NA, the is.na() function returns TRUE, otherwise, returns FALSE.
To check which value in NA in an R data frame, we can use apply function along with is.na function. This will return the data frame in logical form with TRUE and FALSE.
The actual function being used here is not is.na()
but the assignment function `is.na<-`
, for which the default method is `is.na<-.default`
. Printing that function to console we see:
function (x, value)
{
x[value] <- NA
x
}
So clearly, value
is supposed to be an index here. If you index a data.frame
like df["0"]
, it will try to select the column named "0"
. If you assign something to df["0"]
, the column will be created and filled with (in this case) NA
.
To clarify, `is.na<-`
sets values to NA
, it does not replace NA
values with something else.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With