I am trying to replace some missing values in my data with the average values from a similar group.
My data looks like this:
   X   Y
1  x   y
2  x   y
3  NA  y
4  x   y
And I want it to look like this:
  X   Y
1  x   y
2  x   y
3  y   y
4  x   y
I wrote this, and it worked
for(i in 1:nrow(data.frame){
   if( is.na(data.frame$X[i]) == TRUE){
       data.frame$X[i] <- data.frame$Y[i]
   }
  }
But my data.frame is almost half a million lines long, and the for/if statements are pretty slow. What I want is something like
is.na(data.frame$X) <- data.frame$Y
But this gets a mismatched size error. It seems like there should be a command that does this, but I cannot find it here on SO or on the R help list. Any ideas?
You can replace NA values with zero(0) on numeric columns of R data frame by using is.na() , replace() , imputeTS::replace() , dplyr::coalesce() , dplyr::mutate_at() , dplyr::mutate_if() , and tidyr::replace_na() functions.
First, if we want to exclude missing values from mathematical operations use the na. rm = TRUE argument. If you do not exclude these values most functions will return an NA . We may also desire to subset our data to obtain complete observations, those observations (rows) in our data that contain no missing data.
Replace the Elements of a Vector in R Programming – replace() Function. replace() function in R Language is used to replace the values in the specified string vector x with indices given in list by those given in values.
ifelse is your friend.
Using Dirk's dataset
df <- within(df, X <- ifelse(is.na(X), Y, X))
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With