Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the difference between the functions tapply and ave?

I can't wrap my mind around the ave function. I read the help and searched the net but I still cannot understand what it does. I understand it applies some function on a subset of observation but not in the same way as for example tapply

Could someone please enlighten me perhaps with a small example?

Thanks, and excuse me for perhaps an unusual request.

like image 460
ECII Avatar asked Mar 09 '14 22:03

ECII


People also ask

What is the function of tapply in R?

tapply() is used to apply a function over subsets of a vector. It is primarily used when we have the following circumstances: A dataset that can be broken up into groups (via categorical variables - aka factors) We desire to break the dataset up into groups.

What is the AVE function in R?

ave: Group Averages Over Level Combinations of Factors Subsets of x[] are averaged, where each subset consist of those observations with the same factor levels.

What is the use of tapply?

The tapply() helps us to compute statistical measures (mean, median, min, max, etc..) or a self-written function operation for each factor variable in a vector. It helps us to create a subset of a vector and then apply some functions to each of the subsets.

Is Ave the same as mean?

Average and mean are used interchangeably. In Statistics, instead of the term “average”, the term “mean” is used.


1 Answers

tapply returns a single result for each factor level. ave also produces a single result per factor level, but it copies this value to each position in the original data.

ave is handy for producing a new column in a data frame with summary data.

A short example:

tapply(iris$Sepal.Length, iris$Species, FUN=mean)
    setosa versicolor  virginica 
     5.006      5.936      6.588 

One value, the mean for each factor level.

ave on iris produces 150 results, which line up with the original data frame:

 ave(iris$Sepal.Length, iris$Species, FUN=mean)
  [1] 5.006 5.006 5.006 5.006 5.006 5.006 5.006 5.006 5.006 5.006 5.006 5.006 5.006 5.006 5.006 5.006
 [17] 5.006 5.006 5.006 5.006 5.006 5.006 5.006 5.006 5.006 5.006 5.006 5.006 5.006 5.006 5.006 5.006
 [33] 5.006 5.006 5.006 5.006 5.006 5.006 5.006 5.006 5.006 5.006 5.006 5.006 5.006 5.006 5.006 5.006
 [49] 5.006 5.006 5.936 5.936 5.936 5.936 5.936 5.936 5.936 5.936 5.936 5.936 5.936 5.936 5.936 5.936
 [65] 5.936 5.936 5.936 5.936 5.936 5.936 5.936 5.936 5.936 5.936 5.936 5.936 5.936 5.936 5.936 5.936
 [81] 5.936 5.936 5.936 5.936 5.936 5.936 5.936 5.936 5.936 5.936 5.936 5.936 5.936 5.936 5.936 5.936
 [97] 5.936 5.936 5.936 5.936 6.588 6.588 6.588 6.588 6.588 6.588 6.588 6.588 6.588 6.588 6.588 6.588
[113] 6.588 6.588 6.588 6.588 6.588 6.588 6.588 6.588 6.588 6.588 6.588 6.588 6.588 6.588 6.588 6.588
[129] 6.588 6.588 6.588 6.588 6.588 6.588 6.588 6.588 6.588 6.588 6.588 6.588 6.588 6.588 6.588 6.588
[145] 6.588 6.588 6.588 6.588 6.588 6.588

As noted in the comments, here the single value is being recycled to fill each location in the original data.

If the function returns multiple values, these are recycled if necessary to fill in the locations. For example:

d <- data.frame(a=rep(1:2, each=5), b=1:10)
ave(d$b, d$a, FUN=rev)
 [1]  5  4  3  2  1 10  9  8  7  6

Thanks to Josh and thelatemail.

like image 124
Matthew Lundberg Avatar answered Nov 14 '22 23:11

Matthew Lundberg