I want to replace <code><NA></code> values in a factors column with a valid value. But I can not find a way. This example is only for demonstration. The original data comes from a foreign csv file I have to deal with. <pre class="prettyprint"><code>df <- data.frame(a=sample(0:10, size=10, replace=TRUE), b=sample(20:30, size=10, replace=TRUE)) df[df$a==0,'a'] <- NA df$a <- as.factor(df$a) </code></pre> Could look like this <pre class="prettyprint"><code> a b 1 1 29 2 2 23 3 3 23 4 3 22 5 4 28 6 <NA> 24 7 2 21 8 4 25 9 <NA> 29 10 3 24 </code></pre> Now I want to replace the <code><NA></code> values with a number. <pre class="prettyprint"><code>df[is.na(df$a), 'a'] <- 88 In `[<-.factor`(`*tmp*`, iseq, value = c(88, 88)) : invalid factor level, NA generated </code></pre> I think I missed a fundamental R concept about factors. Am I? I can not understand why it doesn't work. I think <code>invalid factor level</code> means that <code>88</code> is not a valid level in that factor, right? So I have to tell the factor column that there is another level?

1) addNA If <code>fac</code> is a factor <code>addNA(fac)</code> is the same factor but with NA added as a level. See <code>?addNA</code> To force the NA level to be 88: <pre class="prettyprint"><code>facna <- addNA(fac) levels(facna) <- c(levels(fac), 88) </code></pre> giving: <pre class="prettyprint"><code>> facna [1] 1 2 3 3 4 88 2 4 88 3 Levels: 1 2 3 4 88 </code></pre> 1a) This can be written in a single line as follows: <pre class="prettyprint"><code>`levels<-`(addNA(fac), c(levels(fac), 88)) </code></pre> 2) factor It can also be done in one line using the various arguments of <code>factor</code> like this: <pre class="prettyprint"><code>factor(fac, levels = levels(addNA(fac)), labels = c(levels(fac), 88), exclude = NULL) </code></pre> 2a) or equivalently: <pre class="prettyprint"><code>factor(fac, levels = c(levels(fac), NA), labels = c(levels(fac), 88), exclude = NULL) </code></pre> 3) ifelse Another approach is: <pre class="prettyprint"><code>factor(ifelse(is.na(fac), 88, paste(fac)), levels = c(levels(fac), 88)) </code></pre> 4) forcats The forcats package has a function for this: <pre class="prettyprint"><code>library(forcats) fct_explicit_na(fac, "88") ## [1] 1 2 3 3 4 88 2 4 88 3 ## Levels: 1 2 3 4 88 </code></pre> Note: We used the following for input <code>fac</code> <pre class="prettyprint"><code>fac <- structure(c(1L, 2L, 3L, 3L, 4L, NA, 2L, 4L, NA, 3L), .Label = c("1", "2", "3", "4"), class = "factor") </code></pre> Update: Have improved (1) and added (1a). Later added (4).

Replace <NA> in a factor column

Tags:

replace

dataframe

r

na

I want to replace <NA> values in a factors column with a valid value. But I can not find a way. This example is only for demonstration. The original data comes from a foreign csv file I have to deal with.

df <- data.frame(a=sample(0:10, size=10, replace=TRUE),                  b=sample(20:30, size=10, replace=TRUE)) df[df$a==0,'a'] <- NA df$a <- as.factor(df$a)

Could look like this

      a  b 1     1 29 2     2 23 3     3 23 4     3 22 5     4 28 6  <NA> 24 7     2 21 8     4 25 9  <NA> 29 10    3 24

Now I want to replace the <NA> values with a number.

df[is.na(df$a), 'a'] <- 88 In `[<-.factor`(`*tmp*`, iseq, value = c(88, 88)) :   invalid factor level, NA generated

I think I missed a fundamental R concept about factors. Am I? I can not understand why it doesn't work. I think invalid factor level means that 88 is not a valid level in that factor, right? So I have to tell the factor column that there is another level?

827

asked Aug 24 '16 14:08

buhtz

2 Answers

1) addNA If fac is a factor addNA(fac) is the same factor but with NA added as a level. See ?addNA

To force the NA level to be 88:

facna <- addNA(fac) levels(facna) <- c(levels(fac), 88)

giving:

> facna  [1] 1  2  3  3  4  88 2  4  88 3  Levels: 1 2 3 4 88

1a) This can be written in a single line as follows:

`levels<-`(addNA(fac), c(levels(fac), 88))

2) factor It can also be done in one line using the various arguments of factor like this:

factor(fac, levels = levels(addNA(fac)), labels = c(levels(fac), 88), exclude = NULL)

2a) or equivalently:

factor(fac, levels = c(levels(fac), NA), labels = c(levels(fac), 88), exclude = NULL)

3) ifelse Another approach is:

factor(ifelse(is.na(fac), 88, paste(fac)), levels = c(levels(fac), 88))

4) forcats The forcats package has a function for this:

library(forcats)  fct_explicit_na(fac, "88") ## [1] 1  2  3  3  4  88 2  4  88 3  ## Levels: 1 2 3 4 88

Note: We used the following for input fac

fac <- structure(c(1L, 2L, 3L, 3L, 4L, NA, 2L, 4L, NA, 3L), .Label = c("1",  "2", "3", "4"), class = "factor")

Update: Have improved (1) and added (1a). Later added (4).

answered Nov 01 '22 13:11

G. Grothendieck

other way to do is:

#check levels levels(df$a) #[1] "3"  "4"  "7"  "9"  "10"  #add new factor level. i.e 88 in our example df$a = factor(df$a, levels=c(levels(df$a), 88))  #convert all NA's to 88 df$a[is.na(df$a)] = 88  #check levels again levels(df$a) #[1] "3"  "4"  "7"  "9"  "10" "88"

answered Nov 01 '22 15:11

Karim Kanatov

Related questions
                            
                                Create new column based on 4 values in another column
                            
                                Getting a row from a data frame as a vector in R
                            
                                use multiple columns as variables with sapply
                            
                                Convert dataframe column to 1 or 0 for "true"/"false" values and assign to dataframe
                            
                                Plot normal, left and right skewed distribution in R
                            
                                Choosing eps and minpts for DBSCAN (R)?
                            
                                Comparing R to Matlab for Data Mining
                            
                                Converting nested list to dataframe
                            
                                What is a neat command line equivalent to RStudio's Knit HTML?
                            
                                How do I create a list of vectors in Rcpp?
                            
                                Calculating weighted mean and standard deviation
                            
                                Combine a list of matrices to a single matrix by rows
                            
                                How to optimize for integer parameters (and other discontinuous parameter space) in R?
                            
                                Merging more than 2 dataframes in R by rownames
                            
                                Combining matrices into an array in R
                            
                                Include a javascript file in Shiny app
                            
                                How do I create a copy of a data frame in R
                            
                                How can I concatenate a vector? [duplicate]
                            
                                How to remove rows of a matrix by row name, rather than numerical index?
                            
                                Set a Data Frame Column as the Index of R data.frame object

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With