Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find and replace missing values with row mean

I have a data frame with NAs and I want to replace the NAs with row means

c1 = c(1,2,3,NA)
c2 = c(3,1,NA,3)
c3 = c(2,1,3,1)

df = data.frame(c1,c2,c3)

> df
  c1 c2 c3
1  1  3  2
2  2  1  1
3  3 NA  3
4 NA  3  1

so that

> df
  c1 c2 c3
1  1  3  2
2  2  1  1
3  3  3  3
4  2  3  1
like image 468
Brian Avatar asked Jul 23 '13 14:07

Brian


3 Answers

Very similar to @baptiste's answer

> ind <- which(is.na(df), arr.ind=TRUE)
> df[ind] <- rowMeans(df,  na.rm = TRUE)[ind[,1]]
like image 147
Jilber Urbina Avatar answered Oct 19 '22 12:10

Jilber Urbina


I think this works,

df[which(is.na(df), arr.ind=TRUE)] <- rowMeans(df[!complete.cases(df), ], na.rm=TRUE)
like image 38
baptiste Avatar answered Oct 19 '22 14:10

baptiste


Using apply (note the returned object is a matrix):

t( apply( df , 1 , function(x) { x[ is.na(x) ] = mean( x , na.rm = TRUE ); x } ) )
     c1 c2 c3
[1,]  1  3  2
[2,]  2  1  1
[3,]  3  3  3
[4,]  2  3  1

We use any anonymous function to change the values of each NA in each row to the mean of that row. The only advantage is that you don't have to do any more typing if the number of rows increases. It is not particularly efficient or fast in a computational sense, but more so in a cognitive sense (you won't notice unless you have 000,000's of rows).

like image 30
Simon O'Hanlon Avatar answered Oct 19 '22 12:10

Simon O'Hanlon