Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to group by all but one columns?

Tags:

r

dplyr

How do I tell group_by to group the data by all columns except a given one?

With aggregate, it would be aggregate(x ~ ., ...).

I tried group_by(data, -x), but that groups by the negative-of-x (i.e. the same as grouping by x).

like image 334
Roman Cheplyaka Avatar asked Aug 27 '16 12:08

Roman Cheplyaka


People also ask

How do I group by a specific column in SQL?

Syntax: SELECT column1, function_name(column2) FROM table_name WHERE condition GROUP BY column1, column2 HAVING condition ORDER BY column1, column2; function_name: Name of the function used for example, SUM() , AVG(). table_name: Name of the table. condition: Condition used.

How do I group multiple columns?

A shortcut way to group rows or columns is to highlight the rows/columns you wish to group and use ALT+SHIFT+RIGHT ARROW to group the rows/columns, and ALT+SHIFT+LEFT ARROW to ungroup them. You can go multiple levels as well (so you could group rows 1-30, and then group rows 20-25 as a subgroup of the first).

Can we group by two columns in SQL?

SELECT Statement: The GROUP BY Clause in SQLA GROUP BY clause can contain two or more columns—or, in other words, a grouping can consist of two or more columns.


3 Answers

dplyr version 1.0+

In dplyr 1.0.0 coming up, the _at functions are falling into the superseded lifecycle (i.e. while they remain in dplyr for the foreseeable future, there are now better alternatives that are more actively developed). The new way to accomplish this is via the across function:

df %>%
  group_by(across(c(-hp)))

dplyr v 0.7+

A small update on this question because I stumbled across this myself and found an elegant solution with current version of dplyr (0.7.4): Inside group_by_at(), you can supply the names of columns the same way as in the select() function using vars(). This enables us to group by everything but one column (hp in this example) by writing:

library(dplyr)
df <- as_tibble(mtcars, rownames = "car")
df %>% group_by_at(vars(-hp))
like image 76
Jannik Buhr Avatar answered Oct 05 '22 11:10

Jannik Buhr


Building on the @eipi10's dplyr 0.7.0 edit, group_by_at appears to be the right function for this job. However, if you are simply looking to omit column "x", then you can use:

new2.0 <- dat %>%
  group_by_at(vars(-x)) %>%
  summarize(mean_value = mean(value))

Using @eipi10's example data:

# Fake data
set.seed(492)
dat <- data.frame(value = rnorm(1000),
             g1 = sample(LETTERS, 1000, replace = TRUE),
             g2 = sample(letters, 1000, replace = TRUE),
             g3 = sample(1:10, replace = TRUE),
             other = sample(c("red", "green", "black"), 1000, replace = TRUE))

new <- dat %>% 
  group_by_at(names(dat)[-grep("value", names(dat))]) %>%
  summarise(meanValue = mean(value))


new2.0 <- dat %>% 
  group_by_at(vars(-value)) %>% 
  summarize(meanValue = mean(value))

identical(new, new2.0)
# [1] TRUE
like image 36
ZS27 Avatar answered Oct 05 '22 11:10

ZS27


You can do this using standard evaluation (group_by_ instead of group_by):

# Fake data
set.seed(492)
dat = data.frame(value=rnorm(1000), g1=sample(LETTERS,1000,replace=TRUE),
                 g2=sample(letters,1000,replace=TRUE), g3=sample(1:10, replace=TRUE),
                 other=sample(c("red","green","black"),1000,replace=TRUE))

dat %>% group_by_(.dots=names(dat)[-grep("value", names(dat))]) %>%
  summarise(meanValue=mean(value))
       g1     g2    g3  other   meanValue
   <fctr> <fctr> <int> <fctr>       <dbl>
1       A      a     2  green  0.89281475
2       A      b     2    red -0.03558775
3       A      b     5  black -1.79184218
4       A      c    10  black  0.17518610
5       A      e     5  black  0.25830392
...

See this vignette for more on standard vs. non-standard evaluation in dplyr.

UPDATE for dplyr 0.7.0

To address @ÖmerAn's comment: It looks like group_by_at is the way to go in dplyr 0.7.0 (someone please correct me if I'm wrong about this). For example:

dat %>% 
  group_by_at(setdiff(names(dat), "value")) %>%
  summarise(meanValue=mean(value))
# Groups:   g1, g2, g3 [?]
       g1     g2    g3  other   meanValue
   <fctr> <fctr> <int> <fctr>       <dbl>
 1      A      a     2  green  0.89281475
 2      A      b     2    red -0.03558775
 3      A      b     5  black -1.79184218
 4      A      c    10  black  0.17518610
 5      A      e     5  black  0.25830392
 6      A      e     5    red -0.81879788
 7      A      e     7  green  0.30836054
 8      A      f     2  green  0.05537047
 9      A      g     1  black  1.00156405
10      A      g    10  black  1.26884303
# ... with 949 more rows

Let's confirm both methods give the same output (in dplyr 0.7.0):

new = dat %>% 
  group_by_at(setdiff(names(dat), "value")) %>%
  summarise(meanValue=mean(value))

old = dat %>% 
  group_by_(.dots=names(dat)[-grep("value", names(dat))]) %>%
  summarise(meanValue=mean(value))

identical(old, new)
# [1] TRUE
like image 35
eipi10 Avatar answered Oct 05 '22 12:10

eipi10