Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

select minus operator in dplyr group_by

Tags:

r

dplyr

Does anyone know of a fast way to select 'all-but-one' (or 'all-but-a-few') columns when using dplyr::group_by? Ultimately, I just want to aggregate over all distinct rows after removing a few select columns, but I don't want to have to explicitly list all the grouping columns each time (since those get added and removed somewhat frequently in my analysis).

Example:

 > df <- data_frame(a = c(1,1,2,2), b = c("foo", "foo", "bar", "bar"), c = runif(4))
 > df
 Source: local data frame [4 x 3]

       a     b          c
   (dbl) (chr)      (dbl)
 1     1   foo 0.95460749
 2     1   foo 0.05094088
 3     2   bar 0.93032589
 4     2   bar 0.40081121

Now I want to aggregate by a and b, so I can do this:

 > df %>% group_by(a, b) %>% summarize(mean(c))
 Source: local data frame [2 x 3]
 Groups: a [?]

       a     b   mean(c)
   (dbl) (chr)     (dbl)
 1     1   foo 0.5027742
 2     2   bar 0.6655686

Great. But, I'd really like to be able to do something like just specify not c, similar to dplyr::select(-c):

 > df %>% select(-c)
 Source: local data frame [4 x 2]

       a     b
   (dbl) (chr)
 1     1   foo
 2     1   foo
 3     2   bar
 4     2   bar

But group_by can apply expressions, so the equivalent doesn't work:

 > df %>% group_by(-c) %>% summarize(mean(c))
 Source: local data frame [4 x 2]

            -c    mean(c)
         (dbl)      (dbl)
 1 -0.95460749 0.95460749
 2 -0.93032589 0.93032589
 3 -0.40081121 0.40081121
 4 -0.05094088 0.05094088

Anyone know if I'm just missing a basic function or shortcut to help me do this quickly?

Example use case: if df suddenly gains a new column d, I'd like the downstream code to now aggregate over unique combinations of a, b, and d, without me having to explicitly add d to the group_by call.)

like image 755
mmuurr Avatar asked Nov 08 '22 17:11

mmuurr


1 Answers

In current versions of dplyr, the function group_by_at, together with vars, accomplishes this goal:

df %>% group_by_at(vars(-c)) %>% summarize(mean(c))
# A tibble: 2 x 3
# Groups:   a [?]
      a     b  `sum(c)`
  <dbl> <chr>     <dbl>
1     1   foo 0.9851376
2     2   bar 1.0954412

Appears to have been introduced in dplyr 0.7.0, in June 2017

like image 137
user295691 Avatar answered Nov 15 '22 07:11

user295691