I have a data frame with several factor columns containing NaN
's that I would like to convert to NA
's (the NaN
seems to be a problem for using linear regression objects to predict on new data).
> tester1 <- c("2", "2", "3", "4", "2", "3", NaN)
> tester1
[1] "2" "2" "3" "4" "2" "3" "NaN"
> tester1[is.nan(tester1)] = NA
> tester1
[1] "2" "2" "3" "4" "2" "3" "NaN"
> tester1[is.nan(tester1)] = "NA"
> tester1
[1] "2" "2" "3" "4" "2" "3" "NaN"
The NaN values are referred to as the Not A Number in R. It is also called undefined or unrepresentable but it belongs to numeric data type for the values that are not numeric, especially in case of floating-point arithmetic. To remove rows from data frame in R that contains NaN, we can use the function na. omit.
In R, missing values are represented by the symbol NA (not available). Impossible values (e.g., dividing by zero) are represented by the symbol NaN (not a number). Unlike SAS, R uses the same symbol for character and numeric data.
To check NaN values in R, use the is. nan() function. The is. nan() is a built-in R function that tests the object's value and returns TRUE if it finds the NaN value; otherwise, it returns FALSE.
Here's the problem: Your vector is character in mode, so of course it's "not a number". That last element got interpreted as the string "NaN". Using is.nan
will only make sense if the vector is numeric. If you want to make a value missing in a character vector (so that it gets handle properly by regression functions), then use (without any quotes), NA_character_
.
> tester1 <- c("2", "2", "3", "4", "2", "3", NA_character_)
> tester1
[1] "2" "2" "3" "4" "2" "3" NA
> is.na(tester1)
[1] FALSE FALSE FALSE FALSE FALSE FALSE TRUE
Neither "NA" nor "NaN" are really missing in character vectors. If for some reason there were values in a factor variable that were "NaN" then you would have been able just use logical indexing:
tester1[tester1 == "NaN"] = "NA"
# but that would not really be a missing value either
# and it might screw up a factor variable anyway.
tester1[tester1=="NaN"] <- "NA"
Warning message:
In `[<-.factor`(`*tmp*`, tester1 == "NaN", value = "NA") :
invalid factor level, NAs generated
##########
tester1 <- factor(c("2", "2", "3", "4", "2", "3", NaN))
> tester1[tester1 =="NaN"] <- NA_character_
> tester1
[1] 2 2 3 4 2 3 <NA>
Levels: 2 3 4 NaN
That last result might be surprising. There is a remaining "NaN" level but none of elements is "NaN". Instead the element that was "NaN" is now a real missing value signified in print as .
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With