I'm trying to replace all my plyr
calls with dplyr
. There are still a few snags and one of them is with the group_by
function. I imagine it acts the same way as the second ddply
argument and does a split, apply and combine based on the grouping variables I list. But that doesn't appear to be the case. Here is a rather trivial example.
Let's define a silly function
mm <- function(x) return(x[1:5, ])
Now we can split the species in the iris
dataset like so and apply this function to each piece.
ddply(iris, .(Species), mm)
This works as intended. However, when I try the same with dplyr
, it doesn't work as expected.
iris %>% group_by(Species) %>% mm
What am I doing wrong?
As shown in ?do
, you can refer to a group with .
in your expression. The following will replicate your ddply
output:
iris %>% group_by(Species) %>% do(.[1:5, ])
# Source: local data frame [15 x 5]
# Groups: Species
#
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1 5.1 3.5 1.4 0.2 setosa
# 2 4.9 3.0 1.4 0.2 setosa
# 3 4.7 3.2 1.3 0.2 setosa
# 4 4.6 3.1 1.5 0.2 setosa
# 5 5.0 3.6 1.4 0.2 setosa
# 6 7.0 3.2 4.7 1.4 versicolor
# 7 6.4 3.2 4.5 1.5 versicolor
# 8 6.9 3.1 4.9 1.5 versicolor
# 9 5.5 2.3 4.0 1.3 versicolor
# 10 6.5 2.8 4.6 1.5 versicolor
# 11 6.3 3.3 6.0 2.5 virginica
# 12 5.8 2.7 5.1 1.9 virginica
# 13 7.1 3.0 5.9 2.1 virginica
# 14 6.3 2.9 5.6 1.8 virginica
# 15 6.5 3.0 5.8 2.2 virginica
More generally, to apply a custom function to groups with dplyr
, you can do something like the following (thanks @docendodiscimus):
iris %>% group_by(Species) %>% do(mm(.))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With