Does anyone know a slick way to order the results coming out of a ddply summarise operation?
This is what I'm doing to get the output ordered by descending depth.
ddims <- ddply(diamonds, .(color), summarise, depth = mean(depth), table = mean(table))
ddims <- ddims[order(-ddims$depth),]
With output...
> ddims
color depth table
7 J 61.88722 57.81239
6 I 61.84639 57.57728
5 H 61.83685 57.51781
4 G 61.75711 57.28863
1 D 61.69813 57.40459
3 F 61.69458 57.43354
2 E 61.66209 57.49120
Not too ugly, but I'm hoping for a way do it nicely within ddply(). Anyone know how?
Hadley's ggplot2 book has this example for ddply and subset but it's not actually sorting the output, just selecting the two smallest diamonds per group.
ddply(diamonds, .(color), subset, order(carat) <= 2)
I'll use this occasion to advertise a bit for data.table
, which is faster to run and (in my perception) at least as elegant to write:
library(data.table)
ddims <- data.table(diamonds)
system.time(ddims <- ddims[, list(depth=mean(depth), table=mean(table)), by=color][order(depth)])
user system elapsed
0.003 0.000 0.004
By contrast, without ordering, your ddply
code already takes 30 times longer:
user system elapsed
0.106 0.010 0.119
With all the respect I have for Hadley's excellent work, e.g. on ggplot2
, and general awesomeness, I must confess that for me, data.table
entirely replaced ddply
-- for speed reasons.
Yes, to sort you can just nest the ddply
in another ddply
. Here's how you would use ddply
to sort on one column, for example your table
column:
ddimsSortedTable <- ddply(ddply(diamonds, .(color),
summarise, depth = mean(depth), table = mean(table)), .(table))
color depth table
1 G 61.75711 57.28863
2 D 61.69813 57.40459
3 F 61.69458 57.43354
4 E 61.66209 57.49120
5 H 61.83685 57.51781
6 I 61.84639 57.57728
7 J 61.88722 57.81239
If you are using dplyr
, I would recommend taking advantage of the %.%
operator, which reads to more intuitive code.
data(diamonds, package = 'ggplot2')
library(dplyr)
diamonds %.%
group_by(color) %.%
summarise(
depth = mean(depth),
table = mean(table)
) %.%
arrange(desc(depth))
A bit late to the party, but things might be a bit different with dplyr. Borrowing crayola's solution for data.table:
dat1 <- microbenchmark(
dtbl<- data.table(diamonds)[, list(depth=mean(depth), table=mean(table)), by=color][order(- depth)],
dplyr_dtbl <- arrange(summarise(group_by(tbl_dt(diamonds),color), depth = mean(depth) , table = mean(table)),-depth),
dplyr_dtfr <- arrange(summarise(group_by(tbl_df(diamonds),color), depth = mean(depth) , table = mean(table)),-depth),
times = 20,
unit = "ms"
)
The results show that dplyr with tbl_dt is a bit slower than the data.table approach. However, dplyr with data.frame is faster:
expr min lq median uq max neval
data.table 9.606571 10.968881 11.958644 12.675205 14.334525 20
dplyr_data.table 13.553307 15.721261 17.494500 19.544840 79.771768 20
dplyr_data.frame 4.643799 5.148327 5.887468 6.537321 7.043286 20
Note: I have obviously changed the names so the microbenchmark results are more readable
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With