Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

aggregate + mean returns wrong result

Tags:

r

aggregate

mean

Using R, I am about to calculate groupwise means with aggregate(..., mean). The mean return however is wrong.

testdata <-read.table(text="
a  b    c   d   year
2   10  1   NA  1998
1   7   NA  NA  1998
4   6   NA  NA  1998
2   2   NA  NA  1998
4   3   2   1   1998
2   6   NA  NA  1998
3   NA  NA  NA  1998
2   7   NA  3   1998
1   8   NA  4   1998
2   7   2   5   1998
1   NA  NA  4   1998
2   5   NA  6   1998
2   4   NA  NA  1998
3   11  2   7   1998
1   18  4   10  1998
3   12  7   5   1998
2   17  NA  NA  1998
2   11  4   5   1998
1   3   1   1   1998
3   5   1   3   1998
",header=TRUE,sep="")
aggregate(. ~ year, testdata,
          function(x) c(mean = round(mean(x, na.rm=TRUE), 2)))
colMeans(subset(testdata, year=="1998", select=d), na.rm=TRUE)

aggregate says the mean of d for group 1998 is 4.62, but it is 4.5.

Reducing the data to one column only, aggregate gets it right:

aggregate(. ~ year, test[4:5],
          function(x) c(mean = round(mean(x, na.rm=TRUE), 2)))

What's wrong with my aggregate() + mean() function?

like image 744
MERose Avatar asked Dec 12 '25 23:12

MERose


1 Answers

aggregate is taking out your rows containing NAs in any column before passing it to the mean function. Try running your aggregate call without na.rm=TRUE - it will still work.

To fix this, you need to change the default na.action in aggregate to na.pass:

aggregate(. ~ year, testdata,
          function(x) c(mean = round(mean(x, na.rm=TRUE), 2)), na.action = na.pass)


  year    a    b    c   d
1 1998 2.15 7.89 2.67 4.5
like image 126
jeremycg Avatar answered Dec 15 '25 12:12

jeremycg