I have a dataset with 12 columns that have NA values, I can replace the NA values with the mean of the columns if I do:
data$F1[which(is.na(data$F1))] <- mean(data$F1,na.rm = TRUE)
And continue separately for each column.
How can I code a for loop that will check each column and replace NA with mean value:
for(i in 1:ncol(data)){
data[is.na(data[,i]), i] <- mean(data[,i], na.rm = TRUE)}
I get the error: object not found.
We can use lapply to loop over the columns, replace the NA elements based on the index with mean of that value
data[] <- lapply(data, function(x) replace(x, is.na(x), mean(x, na.rm = TRUE)))
Or it is easier with na.aggregate from zoo which by default uses FUN = mean
na.aggregate(object, by = 1, ..., FUN = mean, na.rm = FALSE, maxgap = Inf)
library(zoo)
data1 <- na.aggregate(data)
If we are using a for loop, would recommend to subset the column with [[ instead of [ as the 'data' could be data.frame or tbl_df or data.table and all of them works with [[ in extracting the column
for(i in seq_along(data)) data[[i]][is.na(data[[i]])] <-
mean(data[[i]], na.rm = TRUE)
The simplest way I know how to replace all the NA values with column means is using the tidyr package's replace_na function. The two arguments you need are the dataframe and a list of values to replace na's with (first value in the list is what NA's in the first column are replaced with, second value is what NA's in the second column are replaced with, and so on...).
You can use lapply to get the column means:
col_means <- lapply(data, mean, na.rm = TRUE)
data1 <- replace_na(data, col_means)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With