Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R dplyr method to replace all empty factors with NA

Tags:

r

dplyr

Instead of writing and reading a dataframe to fill all empty factors in this method,

na.strings=c("","NA")

I wanted to just apply a function to all the columns and substitute the empties with NA. I've selected the factor columns so far but don't know what to do next.

df %>% select_if(is.factor) %>% ....

How would I be able to do this, preferably with dplyr and/or apply methods

like image 280
Ricky Avatar asked Mar 28 '17 02:03

Ricky


1 Answers

We can use mutate_if

df <- df %>%
         mutate_if(is.factor, funs(factor(replace(., .=="", NA))))

With dplyr 0.8.0, we can also do

df %>% 
    mutate_if(is.factor, na_if, y = "") 

or change the funs (which is getting deprecated to list as @Frederick mentioned in the comments)

df %>%
   mutate_if(is.factor, list(~ na_if(., "")))

Or using base R we can assign the specific levels to NA

j1 <- sapply(df, is.factor)
df[j1] <- lapply(df[j1], function(x) {is.na(x) <- levels(x)==""; x})

data

df <- data.frame(col1 = c("", "A", "B", ""), col2 = c("A", "", "", "C"),
         col3 = 1:4)
like image 125
akrun Avatar answered Oct 16 '22 11:10

akrun