Does anyone know of a fast way to select 'all-but-one' (or 'all-but-a-few') columns when using dplyr::group_by
?
Ultimately, I just want to aggregate over all distinct rows after removing a few select columns, but I don't want to have to explicitly list all the grouping columns each time (since those get added and removed somewhat frequently in my analysis).
Example:
> df <- data_frame(a = c(1,1,2,2), b = c("foo", "foo", "bar", "bar"), c = runif(4))
> df
Source: local data frame [4 x 3]
a b c
(dbl) (chr) (dbl)
1 1 foo 0.95460749
2 1 foo 0.05094088
3 2 bar 0.93032589
4 2 bar 0.40081121
Now I want to aggregate by a
and b
, so I can do this:
> df %>% group_by(a, b) %>% summarize(mean(c))
Source: local data frame [2 x 3]
Groups: a [?]
a b mean(c)
(dbl) (chr) (dbl)
1 1 foo 0.5027742
2 2 bar 0.6655686
Great.
But, I'd really like to be able to do something like just specify not c
, similar to dplyr::select(-c)
:
> df %>% select(-c)
Source: local data frame [4 x 2]
a b
(dbl) (chr)
1 1 foo
2 1 foo
3 2 bar
4 2 bar
But group_by
can apply expressions, so the equivalent doesn't work:
> df %>% group_by(-c) %>% summarize(mean(c))
Source: local data frame [4 x 2]
-c mean(c)
(dbl) (dbl)
1 -0.95460749 0.95460749
2 -0.93032589 0.93032589
3 -0.40081121 0.40081121
4 -0.05094088 0.05094088
Anyone know if I'm just missing a basic function or shortcut to help me do this quickly?
Example use case: if df
suddenly gains a new column d
, I'd like the downstream code to now aggregate over unique combinations of a
, b
, and d
, without me having to explicitly add d
to the group_by
call.)
In current versions of dplyr, the function group_by_at
, together with vars
, accomplishes this goal:
df %>% group_by_at(vars(-c)) %>% summarize(mean(c))
# A tibble: 2 x 3
# Groups: a [?]
a b `sum(c)`
<dbl> <chr> <dbl>
1 1 foo 0.9851376
2 2 bar 1.0954412
Appears to have been introduced in dplyr 0.7.0, in June 2017
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With