I am trying to replace some missing values in my data with the average values from a similar group.
My data looks like this:
X Y
1 x y
2 x y
3 NA y
4 x y
And I want it to look like this:
X Y
1 x y
2 x y
3 y y
4 x y
I wrote this, and it worked
for(i in 1:nrow(data.frame){
if( is.na(data.frame$X[i]) == TRUE){
data.frame$X[i] <- data.frame$Y[i]
}
}
But my data.frame is almost half a million lines long, and the for/if statements are pretty slow. What I want is something like
is.na(data.frame$X) <- data.frame$Y
But this gets a mismatched size error. It seems like there should be a command that does this, but I cannot find it here on SO or on the R help list. Any ideas?
You can replace NA values with zero(0) on numeric columns of R data frame by using is.na() , replace() , imputeTS::replace() , dplyr::coalesce() , dplyr::mutate_at() , dplyr::mutate_if() , and tidyr::replace_na() functions.
First, if we want to exclude missing values from mathematical operations use the na. rm = TRUE argument. If you do not exclude these values most functions will return an NA . We may also desire to subset our data to obtain complete observations, those observations (rows) in our data that contain no missing data.
Replace the Elements of a Vector in R Programming – replace() Function. replace() function in R Language is used to replace the values in the specified string vector x with indices given in list by those given in values.
ifelse
is your friend.
Using Dirk's dataset
df <- within(df, X <- ifelse(is.na(X), Y, X))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With