Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the difference of tapply and aggregate in R? [closed]

Aaa <- data.frame(amount=c(1,2,1,2,1,1,2,2,1,1,1,2,2,2,1), 
                  card=c("a","b","c","a","c","b","a","c","b","a","b","c","a","c","a"))

aggregate(x=Aaa$amount, by=list(Aaa$card), FUN=mean)

##   Group.1    x
## 1       a 1.50
## 2       b 1.25
## 3       c 1.60

tapply(Aaa$amount, Aaa$card, mean)

##    a    b    c 
## 1.50 1.25 1.60 

Above is an example code.

It seems that aggregate and tapply both are very handy and perform similar functionality.

Can someone explain or give examples on their differences?

like image 791
Neo XU Avatar asked Dec 05 '22 05:12

Neo XU


1 Answers

aggregate is designed to work on multiple columns with one function and returns a dataframe with one row for each category, while tapply is designed to work on a single vector with results returned as a matrix or array. Only using a two-column matrix does not really allow the capacities of either function (or their salient differences) to be demonstrated. aggregate also has a formula method, which tapply does not.

> Aaa <- data.frame(amount=c(1,2,1,2,1,1,2,2,1,1,1,2,2,2,1), cat=sample(letters[21:24], 15,rep=TRUE),
+                   card=c("a","b","c","a","c","b","a","c","b","a","b","c","a","c","a"))
> with( Aaa, tapply(amount, INDEX=list(cat,card), mean) )
    a   b   c
u 1.5 1.5  NA
v 2.0 1.0 2.0
w 1.0  NA 1.5
x 1.5  NA 1.5

>  aggregate(amount~cat+card, data=Aaa, FUN= mean) 
  cat card amount
1   u    a    1.5
2   v    a    2.0
3   w    a    1.0
4   x    a    1.5
5   u    b    1.5
6   v    b    1.0
7   v    c    2.0
8   w    c    1.5
9   x    c    1.5

The xtabs function also delivers an R "table" and it has a formula interface. R tables are matrices that typically have integer values because they are designed to be "contingency tables" holding counts of items in cross-classifications of the marginal categories.

like image 69
IRTFM Avatar answered Mar 23 '23 17:03

IRTFM