Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Any way to do filtering as well as summarizing in ddply?

Tags:

r

dplyr

plyr

I'm just starting withddply and finding it very useful. I want to summarize a data frame and also get rid of some rows in the final output based on whether the summarized column has a particular value. This is like HAVING as well as GROUP BY in SQL. Here's an example:

input = data.frame(id=     c( 1, 1, 2, 2, 3,   3),
                   metric= c(30,50,70,90,40,1050),
                   badness=c( 1, 5, 7, 3, 3,  99))
intermediateoutput = ddply(input, ~ id, summarize,
                           meanMetric=mean(metric),
                           maxBadness=max(badness))
intermediateoutput[intermediateoutput$maxBadness < 50,1:2]

This gives:

  id meanMetric
1  1         40
2  2         80

which is what I want, but can I do it in a single step within the ddply statement somehow?

like image 615
TooTone Avatar asked Jul 16 '14 13:07

TooTone


1 Answers

You should try with dplyr. It is faster, and the code is much easier to read and understand, especially if you use pipes (%>%) :

input %>%
    group_by(id) %>%
    summarize(meanMetric=mean(metric), maxBadness=max(badness)) %>%
    filter(maxBadness <50) %>%
    select(-maxBadness)

Following @Arun comment, you can simplify the code this way :

input %>%
    group_by(id) %>%
    filter(max(badness)<50) %>%
    summarize(meanMetric=mean(metric))
like image 120
juba Avatar answered Sep 28 '22 02:09

juba