My first approach was to use na.strings=""
when I read the data in from a csv. This doesn't work for some reason. I also tried:
df[df==''] <- NA
Which gave me an error: Can't use matrix or array for column indexing.
I tried just the column:
df$col[df$col==''] <- NA
This converts every value in the entire dataframe to NA, even though there are values besides empty strings.
Then I tried to use mutate_all
:
replace.empty <- function(a) {
a[a==""] <- NA
}
#dplyr pipe
df %>% mutate_all(funs(replace.empty))
This also converts every value in the entire dataframe to NA.
I suspect something is weird about my "empty" strings since the first method had no effect but I can't figure out what.
EDIT (at request of MKR)
Output of dput(head(df))
:
structure(c("function (x, df1, df2, ncp, log = FALSE) ", "{",
" if (missing(ncp)) ", " .Call(C_df, x, df1, df2, log)",
" else .Call(C_dnf, x, df1, df2, ncp, log)", "}"), .Dim = c(6L,
1L), .Dimnames = list(c("1", "2", "3", "4", "5", "6"), ""), class =
"noquote")
In R, the easiest way to replace blanks with NA's is by using the na_if() function from the dplyr package. This function checks if a value meets a specific condition (e.g., a blank) and converts it into a NA. Alternatively, you can use basic R code or the ifelse() function.
You can replace NA values with blank space on columns of R dataframe (data. frame) by using is.na() , replace() methods.
You can replace NA values with zero(0) on numeric columns of R data frame by using is.na() , replace() , imputeTS::replace() , dplyr::coalesce() , dplyr::mutate_at() , dplyr::mutate_if() , and tidyr::replace_na() functions.
Replacing values in a data frame is a very handy option available in R for data analysis. Using replace() in R, you can switch NA, 0, and negative values with appropriate to clear up large datasets for analysis.
I'm not sure why df[df==""]<-NA
would not have worked for OP. Let's take a sample data.frame and investigate options.
Option#1: Base-R
df[df==""]<-NA
df
# One Two Three Four
# 1 A A <NA> AAA
# 2 <NA> B BA <NA>
# 3 C <NA> CC CCC
Option#2: dplyr::mutate_all
and na_if
. Or mutate_if
if the data frame has multiple types of columns
library(dplyr)
mutate_all(df, list(~na_if(.,"")))
OR
#if data frame other types of character Then
df %>% mutate_if(is.character, list(~na_if(.,"")))
# One Two Three Four
# 1 A A <NA> AAA
# 2 <NA> B BA <NA>
# 3 C <NA> CC CCC
Toy Data:
df <- data.frame(One=c("A","","C"),
Two=c("A","B",""),
Three=c("","BA","CC"),
Four=c("AAA","","CCC"),
stringsAsFactors = FALSE)
df
# One Two Three Four
# 1 A A AAA
# 2 B BA
# 3 C CC CCC
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With