I am not sure how to loop over each column to replace the NA values with the column mean. When I am trying to replace for one column using the following, it works well.
Column1[is.na(Column1)] <- round(mean(Column1, na.rm = TRUE))
The code for looping over columns is not working:
for(i in 1:ncol(data)){ data[i][is.na(data[i])] <- round(mean(data[i], na.rm = TRUE)) }
the values are not replaced. Can someone please help me with this?
The easiest way to replace NA's with the mean in multiple columns is by using the functions mutate_at() and vars(). These functions let you select the columns in which you want to replace the missing values. To actually replace the NA with the mean, you can use the replace_na() and mean() function.
You can use central tendency measures such as mean, median or mode of the numeric feature column to replace or impute missing values. Pandas Dataframe method in Python such as fillna can be used to replace the missing values. Methods such as mean(), median() and mode() can be used on Dataframe for finding their values.
To replace missing values in R with the minimum, you can use the tidyverse package. Firstly, you use the mutate() function to specify the column in which you want to replace the missing values. Secondly, you call the replace() function to identify the NA's and to substitute them with the column lowest value.
A relatively simple modification of your code should solve the issue:
for(i in 1:ncol(data)){ data[is.na(data[,i]), i] <- mean(data[,i], na.rm = TRUE) }
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With