Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R can't convert NaN to NA

Tags:

r

nan

na

I have a data frame with several factor columns containing NaN's that I would like to convert to NA's (the NaN seems to be a problem for using linear regression objects to predict on new data).

> tester1 <- c("2", "2", "3", "4", "2", "3", NaN)
> tester1 
[1] "2"   "2"   "3"   "4"   "2"   "3"   "NaN"
> tester1[is.nan(tester1)] = NA
> tester1 
[1] "2"   "2"   "3"   "4"   "2"   "3"   "NaN"
> tester1[is.nan(tester1)] = "NA"
> tester1 
[1] "2"   "2"   "3"   "4"   "2"   "3"   "NaN"
like image 871
screechOwl Avatar asked Feb 27 '12 22:02

screechOwl


People also ask

How do I remove NaN in R?

The NaN values are referred to as the Not A Number in R. It is also called undefined or unrepresentable but it belongs to numeric data type for the values that are not numeric, especially in case of floating-point arithmetic. To remove rows from data frame in R that contains NaN, we can use the function na. omit.

What is the difference between NA and NaN in R?

In R, missing values are represented by the symbol NA (not available). Impossible values (e.g., dividing by zero) are represented by the symbol NaN (not a number). Unlike SAS, R uses the same symbol for character and numeric data.

Is NaN Rstudio?

To check NaN values in R, use the is. nan() function. The is. nan() is a built-in R function that tests the object's value and returns TRUE if it finds the NaN value; otherwise, it returns FALSE.


1 Answers

Here's the problem: Your vector is character in mode, so of course it's "not a number". That last element got interpreted as the string "NaN". Using is.nan will only make sense if the vector is numeric. If you want to make a value missing in a character vector (so that it gets handle properly by regression functions), then use (without any quotes), NA_character_.

> tester1 <- c("2", "2", "3", "4", "2", "3", NA_character_)
>  tester1
[1] "2" "2" "3" "4" "2" "3" NA 
>  is.na(tester1)
[1] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE

Neither "NA" nor "NaN" are really missing in character vectors. If for some reason there were values in a factor variable that were "NaN" then you would have been able just use logical indexing:

tester1[tester1 == "NaN"] = "NA"  
# but that would not really be a missing value either 
# and it might screw up a factor variable anyway.

tester1[tester1=="NaN"] <- "NA"
Warning message:
In `[<-.factor`(`*tmp*`, tester1 == "NaN", value = "NA") :
invalid factor level, NAs generated
##########
tester1 <- factor(c("2", "2", "3", "4", "2", "3", NaN))

> tester1[tester1 =="NaN"] <- NA_character_
> tester1
[1] 2    2    3    4    2    3    <NA>
Levels: 2 3 4 NaN

That last result might be surprising. There is a remaining "NaN" level but none of elements is "NaN". Instead the element that was "NaN" is now a real missing value signified in print as .

like image 127
IRTFM Avatar answered Oct 15 '22 09:10

IRTFM