I want to replace <NA>
values in a factors column with a valid value. But I can not find a way. This example is only for demonstration. The original data comes from a foreign csv file I have to deal with.
df <- data.frame(a=sample(0:10, size=10, replace=TRUE), b=sample(20:30, size=10, replace=TRUE)) df[df$a==0,'a'] <- NA df$a <- as.factor(df$a)
Could look like this
a b 1 1 29 2 2 23 3 3 23 4 3 22 5 4 28 6 <NA> 24 7 2 21 8 4 25 9 <NA> 29 10 3 24
Now I want to replace the <NA>
values with a number.
df[is.na(df$a), 'a'] <- 88 In `[<-.factor`(`*tmp*`, iseq, value = c(88, 88)) : invalid factor level, NA generated
I think I missed a fundamental R concept about factors. Am I? I can not understand why it doesn't work. I think invalid factor level
means that 88
is not a valid level in that factor, right? So I have to tell the factor column that there is another level?
The codes of a factor may contain NA . For a numeric x , set exclude=NULL to make NA an extra level ( "NA" ), by default the last level. If "NA" is a level, the way to set a code to be missing is to use is.na on the left-hand-side of an assignment. Under those circumstances missing values are printed as <NA> .
The classic way to replace NA's in R is by using the IS.NA() function. The IS.NA() function takes a vector or data frame as input and returns a logical object that indicates whether a value is missing (TRUE or VALUE). Next, you can use this logical object to create a subset of the missing values and assign them a zero.
1) addNA If fac
is a factor addNA(fac)
is the same factor but with NA added as a level. See ?addNA
To force the NA level to be 88:
facna <- addNA(fac) levels(facna) <- c(levels(fac), 88)
giving:
> facna [1] 1 2 3 3 4 88 2 4 88 3 Levels: 1 2 3 4 88
1a) This can be written in a single line as follows:
`levels<-`(addNA(fac), c(levels(fac), 88))
2) factor It can also be done in one line using the various arguments of factor
like this:
factor(fac, levels = levels(addNA(fac)), labels = c(levels(fac), 88), exclude = NULL)
2a) or equivalently:
factor(fac, levels = c(levels(fac), NA), labels = c(levels(fac), 88), exclude = NULL)
3) ifelse Another approach is:
factor(ifelse(is.na(fac), 88, paste(fac)), levels = c(levels(fac), 88))
4) forcats The forcats package has a function for this:
library(forcats) fct_explicit_na(fac, "88") ## [1] 1 2 3 3 4 88 2 4 88 3 ## Levels: 1 2 3 4 88
Note: We used the following for input fac
fac <- structure(c(1L, 2L, 3L, 3L, 4L, NA, 2L, 4L, NA, 3L), .Label = c("1", "2", "3", "4"), class = "factor")
Update: Have improved (1) and added (1a). Later added (4).
other way to do is:
#check levels levels(df$a) #[1] "3" "4" "7" "9" "10" #add new factor level. i.e 88 in our example df$a = factor(df$a, levels=c(levels(df$a), 88)) #convert all NA's to 88 df$a[is.na(df$a)] = 88 #check levels again levels(df$a) #[1] "3" "4" "7" "9" "10" "88"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With