Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to replace empty string with NA in R dataframe?

Tags:

r

na

dplyr

My first approach was to use na.strings="" when I read the data in from a csv. This doesn't work for some reason. I also tried:

df[df==''] <- NA

Which gave me an error: Can't use matrix or array for column indexing.

I tried just the column:

df$col[df$col==''] <- NA

This converts every value in the entire dataframe to NA, even though there are values besides empty strings.

Then I tried to use mutate_all:

replace.empty <- function(a) {
    a[a==""] <- NA
}

#dplyr pipe
df %>% mutate_all(funs(replace.empty))

This also converts every value in the entire dataframe to NA.

I suspect something is weird about my "empty" strings since the first method had no effect but I can't figure out what.

EDIT (at request of MKR) Output of dput(head(df)):

structure(c("function (x, df1, df2, ncp, log = FALSE) ", "{",
"    if (missing(ncp)) ", "        .Call(C_df, x, df1, df2, log)",
"    else .Call(C_dnf, x, df1, df2, ncp, log)", "}"), .Dim = c(6L,
1L), .Dimnames = list(c("1", "2", "3", "4", "5", "6"), ""), class = 
"noquote")
like image 984
mp3242 Avatar asked Jul 20 '18 19:07

mp3242


People also ask

How do I convert blank values to NA in R?

In R, the easiest way to replace blanks with NA's is by using the na_if() function from the dplyr package. This function checks if a value meets a specific condition (e.g., a blank) and converts it into a NA. Alternatively, you can use basic R code or the ifelse() function.

How do I replace values in a Dataframe with NA in R?

You can replace NA values with blank space on columns of R dataframe (data. frame) by using is.na() , replace() methods.

How do I replace NA data in R?

You can replace NA values with zero(0) on numeric columns of R data frame by using is.na() , replace() , imputeTS::replace() , dplyr::coalesce() , dplyr::mutate_at() , dplyr::mutate_if() , and tidyr::replace_na() functions.

Can you find and replace in R?

Replacing values in a data frame is a very handy option available in R for data analysis. Using replace() in R, you can switch NA, 0, and negative values with appropriate to clear up large datasets for analysis.


1 Answers

I'm not sure why df[df==""]<-NA would not have worked for OP. Let's take a sample data.frame and investigate options.

Option#1: Base-R

df[df==""]<-NA

df
#    One  Two Three Four
# 1    A    A  <NA>  AAA
# 2 <NA>    B    BA <NA>
# 3    C <NA>    CC  CCC

Option#2: dplyr::mutate_all and na_if. Or mutate_if if the data frame has multiple types of columns

library(dplyr)

mutate_all(df, list(~na_if(.,"")))

OR

#if data frame other types of character Then
df %>% mutate_if(is.character, list(~na_if(.,""))) 

#    One  Two Three Four
# 1    A    A  <NA>  AAA
# 2 <NA>    B    BA <NA>
# 3    C <NA>    CC  CCC

Toy Data:

df <- data.frame(One=c("A","","C"), 
                 Two=c("A","B",""), 
                 Three=c("","BA","CC"), 
                 Four=c("AAA","","CCC"), 
                 stringsAsFactors = FALSE)

df
#   One Two Three Four
# 1   A   A        AAA
# 2       B    BA     
# 3   C        CC  CCC
like image 164
MKR Avatar answered Oct 11 '22 18:10

MKR