Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Adding another grouping with dplyr

Tags:

r

dplyr

I would like to mutate a data frame twice, grouping by two sets of columns which intersect each other. i.e.:

df <- df %>% group_by(a, b) %>% mutate(x = sum(d))
df <- df %>% group_by(a, b, c) %>% mutate(y = sum(e))

Is there a faster/more elegant way to do this? I was hoping to be able to do something like:

df <- df %>%
    group_by(a, b) %>%
    mutate(x = sum(d)) %>%
    group_by(c) %>%
    mutate(y = sum(e))

Or perhaps save a variable with the first group_by applied and then use it twice.

like image 525
Sam Brightman Avatar asked Oct 29 '15 18:10

Sam Brightman


People also ask

Can you group by multiple columns in dplyr?

The group_by() method is used to group the data contained in the data frame based on the columns specified as arguments to the function call.

What does %>% do in dplyr?

%>% is called the forward pipe operator in R. It provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. It is defined by the package magrittr (CRAN) and is heavily used by dplyr (CRAN).

What does ungroup do in dplyr?

In ungroup() , variables to remove from the grouping. When FALSE , the default, group_by() will override existing groups.


1 Answers

We use add=TRUE in the second group_by to group by 3 variables, adding c in the OP's example-

 df %>%
   group_by(a, b) %>%
   mutate(x = sum(d)) %>%
   group_by(c, add=TRUE) %>%
   mutate(y = sum(e))

According to the documentation for ?group_by

By default, when add = FALSE, group_by will override existing groups. To instead add to the existing groups, use add = TRUE

This can be done in one group_by call, but only with non-dplyrish functions:

 df %>%
   group_by(a, b) %>%
   mutate(x = sum(d), y = ave(e, c, sum))
like image 129
akrun Avatar answered Oct 23 '22 04:10

akrun