I want to replace NAs present in a column of a DATA TABLE with the mean of the same column. I am doing the following. But it is not working.
ww <- data.table(iris)
ww <- ww[1:5 , ]
ww[1,1] <- NA
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1: NA 3.5 1.4 0.2 setosa
2: 4.9 3.0 1.4 0.2 setosa
3: 4.7 3.2 1.3 0.2 setosa
4: 4.6 3.1 1.5 0.2 setosa
5: 5.0 3.6 1.4 0.2 setosa
ww[is.na(Sepal.Length) , Sepal.Length:= mean(Sepal.Length, na.rm = T)]
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1: NaN 3.5 1.4 0.2 setosa
2: 4.9 3.0 1.4 0.2 setosa
3: 4.7 3.2 1.3 0.2 setosa
4: 4.6 3.1 1.5 0.2 setosa
5: 5.0 3.6 1.4 0.2 setosa
Why am I getting NaN in place of NA when it should have been the mean of the rest of the values (4.9, 4.7, 4.6, 5.0)?
What is the alternate of acheiving this in case something is wrong with this syntax?
I want to the syntax for the data table.
The easiest way to replace NA's with the mean in multiple columns is by using the functions mutate_at() and vars(). These functions let you select the columns in which you want to replace the missing values. To actually replace the NA with the mean, you can use the replace_na() and mean() function.
The classic way to replace NA's in R is by using the IS.NA() function. The IS.NA() function takes a vector or data frame as input and returns a logical object that indicates whether a value is missing (TRUE or VALUE). Next, you can use this logical object to create a subset of the missing values and assign them a zero.
na.aggregate
in the zoo package replaces NAs with the mean of the non-NAs in the same column:
library(zoo)
ww[, Sepal.Length := na.aggregate(Sepal.Length)]
While the zoo
answer is pretty nice it requires new dependency.
Using just data.table
you could do the following.
library(data.table)
# prepare data
ww = data.table(iris[1:5,])
ww[1, Sepal.Length := NA]
# solution
ww[, Sepal.Length.mean := mean(Sepal.Length, na.rm = TRUE) # calculate mean
][is.na(Sepal.Length), Sepal.Length := Sepal.Length.mean # replace NA with mean
][, Sepal.Length.mean := NULL # remove mean col
][] # just prints
While it may looks biggish comparing to zoo's, it is performance efficient as all steps are made using update by reference :=
.
It can also be easily tuned to replace NA with mean by group, just using by
argument in data.table.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With