Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ddply -> dplyr: .fun = summarize with several rows

Tags:

r

dplyr

plyr

This a somewhat follow-up to this question. I want to use dplyr functions instead of ddply to apply a function that yields several rows which are directly included in the result. I guess this is best explained in the following example:

library(plyr)
#library(dplyr)

dfx <- data.frame(
    group = c(rep('A', 8), rep('B', 15), rep('C', 6)),
    sex = sample(c("M", "F"), size = 29, replace = TRUE),
    age = runif(n = 29, min = 18, max = 54)
    )

p <- c(.2,.4,.6,.8)
ddply(dfx, .(group), .fun = summarize, p=p, stats=quantile(age,probs=p))
# dfx %>% group_by(group) %>% do(p=p, stats=quantile(.$age, probs=p))

The ddply solutions looks like this (don't load dplyr for this to work):

#    group   p    stats
# 1      A 0.2 32.81104
# 2      A 0.4 34.13195
# 3      A 0.6 37.34055
# 4      A 0.8 44.21874
# 5      B 0.2 25.58858
# 6      B 0.4 34.67511
# 7      B 0.6 40.68370
# 8      B 0.8 44.67346
# 9      C 0.2 37.22625
# 10     C 0.4 42.46769
# 11     C 0.6 43.27065
# 12     C 0.8 44.54724

The dplyr solution (the commented lines) yields the following:

#   group        p    stats
# 1     A <dbl[4]> <dbl[4]>
# 2     B <dbl[4]> <dbl[4]>
# 3     C <dbl[4]> <dbl[4]>

Here, the data is "hidden" in the list elements. Is there a way to directly get the ddply solution above? (Note that I posted this question on the manipulatr mailing list, so far with no answer.)

like image 688
sebschub Avatar asked Dec 15 '22 21:12

sebschub


2 Answers

Check if this works: Output is different because of no set.seed

 dfx %>% group_by(group) %>% do(data.frame(p=p, stats=quantile(.$age, probs=p)))
Source: local data frame [12 x 3]
Groups: group

    group   p    stats
1      A 0.2 27.68069
2      A 0.4 35.36915
3      A 0.6 39.15223
4      A 0.8 46.41073
5      B 0.2 34.68378
6      B 0.4 37.22358
7      B 0.6 40.76185
8      B 0.8 44.48645
9      C 0.2 33.86023
10     C 0.4 36.30515
11     C 0.6 46.80672
12     C 0.8 52.82140
like image 185
akrun Avatar answered Jan 09 '23 02:01

akrun


I think you got bitten (as did I) by the (new) do() syntax from dplyr v 0.2 that significantly changed from the earlier 0.1.3 version.

The 0.2 do() has two modes of operation:

  1. If you DO NOT give it named arguments, it will return the results from its ... argument as a data frame.

  2. If you DO give it named arguments, it will return the results of the ... argument of do() as list elements.

Please see ?do for a (probably) more accurate explanation as well as Hadley's blog on the release of v 0.2.

like image 20
Paul Lemmens Avatar answered Jan 09 '23 03:01

Paul Lemmens