I have a dataset with 12 columns that have NA values, I can replace the NA values with the mean of the columns if I do:
data$F1[which(is.na(data$F1))] <- mean(data$F1,na.rm = TRUE)
And continue separately for each column.
How can I code a for loop that will check each column and replace NA with mean value:
for(i in 1:ncol(data)){
data[is.na(data[,i]), i] <- mean(data[,i], na.rm = TRUE)}
I get the error: object not found.
We can use lapply
to loop over the columns, replace
the NA
elements based on the index with mean
of that value
data[] <- lapply(data, function(x) replace(x, is.na(x), mean(x, na.rm = TRUE)))
Or it is easier with na.aggregate
from zoo
which by default uses FUN = mean
na.aggregate(object, by = 1, ..., FUN = mean, na.rm = FALSE, maxgap = Inf)
library(zoo)
data1 <- na.aggregate(data)
If we are using a for
loop, would recommend to subset the column with [[
instead of [
as the 'data' could be data.frame
or tbl_df
or data.table
and all of them works with [[
in extracting the column
for(i in seq_along(data)) data[[i]][is.na(data[[i]])] <-
mean(data[[i]], na.rm = TRUE)
The simplest way I know how to replace all the NA values with column means is using the tidyr package's replace_na function. The two arguments you need are the dataframe and a list of values to replace na's with (first value in the list is what NA's in the first column are replaced with, second value is what NA's in the second column are replaced with, and so on...).
You can use lapply to get the column means:
col_means <- lapply(data, mean, na.rm = TRUE)
data1 <- replace_na(data, col_means)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With