I would like to mutate a data frame twice, grouping by two sets of columns which intersect each other. i.e.:
df <- df %>% group_by(a, b) %>% mutate(x = sum(d))
df <- df %>% group_by(a, b, c) %>% mutate(y = sum(e))
Is there a faster/more elegant way to do this? I was hoping to be able to do something like:
df <- df %>%
group_by(a, b) %>%
mutate(x = sum(d)) %>%
group_by(c) %>%
mutate(y = sum(e))
Or perhaps save a variable with the first group_by
applied and then use it twice.
The group_by() method is used to group the data contained in the data frame based on the columns specified as arguments to the function call.
%>% is called the forward pipe operator in R. It provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. It is defined by the package magrittr (CRAN) and is heavily used by dplyr (CRAN).
In ungroup() , variables to remove from the grouping. When FALSE , the default, group_by() will override existing groups.
We use add=TRUE
in the second group_by
to group by 3 variables, adding c
in the OP's example-
df %>%
group_by(a, b) %>%
mutate(x = sum(d)) %>%
group_by(c, add=TRUE) %>%
mutate(y = sum(e))
According to the documentation for ?group_by
By default, when add = FALSE, group_by will override existing groups. To instead add to the existing groups, use add = TRUE
This can be done in one group_by
call, but only with non-dplyrish functions:
df %>%
group_by(a, b) %>%
mutate(x = sum(d), y = ave(e, c, sum))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With