Can someone help me get the difference between aggregate and ddply with the following example:
A data frame:
mydat <- data.frame(first = rpois(10,10), second = rpois(10,10),
third = rpois(10,10), group = c(rep("a",5),rep("b",5)))
Use aggregate to apply a function to a part of the data frame split by a factor:
aggregate(mydat[,1:3], by=list(mydat$group), mean)
Group.1 first second third
1 a 8.8 8.8 10.2
2 b 6.8 9.4 13.4
Try to use aggregate for another function (returns an error message):
aggregate(mydat[,1:3], by=list(mydat$group), function(u) cor(u$first,u$second))
Error in u$second : $ operator is invalid for atomic vectors
Now, try the same with ddply (plyr package):
ddply(mydat, .(group), function(u) cor(u$first,u$second))
group V1
1 a -0.5083042
2 b -0.6329968
All tips, links, criticism are highly appreciated.
aggregate
calls FUN on each column independently, which is why you get independent means. ddply
is going to pass all columns to the function. A quick demonstration of what is being passed in aggregate
may be in order:
Some sample data for demonstration:
d <- data.frame(a=1:4, b=5:8, c=c(1,1,2,2))
> d
a b c
1 1 5 1
2 2 6 1
3 3 7 2
4 4 8 2
By using the function print
and ignoring the result of the commands aggregate
or ddply
, we can see what gets passed to the function in each iteration.
aggregate
:
tmp <- aggregate(d[1:2], by=list(d$c), print)
[1] 1 2
[1] 3 4
[1] 5 6
[1] 7 8
Note that individual columns are sent to print.
ddply
:
tmp <- ddply(d, .(c), print)
a b c
1 1 5 1
2 2 6 1
a b c
3 3 7 2
4 4 8 2
Note that data frames are being sent to print.
You've already been told why aggregate
was the wrong {base} function to use for a function that requires two vectors as arguments, but you haven't yet been told which non-ddply approach would have succeeded.
The by( ... grp, FUN)
method:
> cbind (by( mydat, mydat["group"], function(d) cor(d$first, d$second)) )
[,1]
a 0.6529822
b -0.1964186
The sapply(split( ..., grp), fn)
method
> sapply( split( mydat, mydat["group"]), function(d) cor(d$first, d$second))
a b
0.6529822 -0.1964186
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With