How do I tell <code>group_by</code> to group the data by all columns except a given one? With <code>aggregate</code>, it would be <code>aggregate(x ~ ., ...)</code>. I tried <code>group_by(data, -x)</code>, but that groups by the negative-of-x (i.e. the same as grouping by x).

Building on the @eipi10's dplyr 0.7.0 edit, <code>group_by_at</code> appears to be the right function for this job. However, if you are simply looking to omit column "x", then you can use: <pre class="prettyprint"><code>new2.0 <- dat %>% group_by_at(vars(-x)) %>% summarize(mean_value = mean(value)) </code></pre> Using @eipi10's example data: <pre class="prettyprint"><code># Fake data set.seed(492) dat <- data.frame(value = rnorm(1000), g1 = sample(LETTERS, 1000, replace = TRUE), g2 = sample(letters, 1000, replace = TRUE), g3 = sample(1:10, replace = TRUE), other = sample(c("red", "green", "black"), 1000, replace = TRUE)) new <- dat %>% group_by_at(names(dat)[-grep("value", names(dat))]) %>% summarise(meanValue = mean(value)) new2.0 <- dat %>% group_by_at(vars(-value)) %>% summarize(meanValue = mean(value)) identical(new, new2.0) # [1] TRUE </code></pre>

You can do this using standard evaluation (<code>group_by_</code> instead of <code>group_by</code>): <pre class="prettyprint"><code># Fake data set.seed(492) dat = data.frame(value=rnorm(1000), g1=sample(LETTERS,1000,replace=TRUE), g2=sample(letters,1000,replace=TRUE), g3=sample(1:10, replace=TRUE), other=sample(c("red","green","black"),1000,replace=TRUE)) dat %>% group_by_(.dots=names(dat)[-grep("value", names(dat))]) %>% summarise(meanValue=mean(value)) </code></pre> <blockquote> <pre class="prettyprint"><code> g1 g2 g3 other meanValue <fctr> <fctr> <int> <fctr> <dbl> 1 A a 2 green 0.89281475 2 A b 2 red -0.03558775 3 A b 5 black -1.79184218 4 A c 10 black 0.17518610 5 A e 5 black 0.25830392 ... </code></pre> </blockquote> See this vignette for more on standard vs. non-standard evaluation in <code>dplyr</code>. <h3>UPDATE for <code>dplyr</code> 0.7.0</h3> To address @ÖmerAn's comment: It looks like <code>group_by_at</code> is the way to go in <code>dplyr</code> 0.7.0 (someone please correct me if I'm wrong about this). For example: <pre class="prettyprint"><code>dat %>% group_by_at(setdiff(names(dat), "value")) %>% summarise(meanValue=mean(value)) </code></pre> <blockquote> <pre class="prettyprint"><code># Groups: g1, g2, g3 [?] g1 g2 g3 other meanValue <fctr> <fctr> <int> <fctr> <dbl> 1 A a 2 green 0.89281475 2 A b 2 red -0.03558775 3 A b 5 black -1.79184218 4 A c 10 black 0.17518610 5 A e 5 black 0.25830392 6 A e 5 red -0.81879788 7 A e 7 green 0.30836054 8 A f 2 green 0.05537047 9 A g 1 black 1.00156405 10 A g 10 black 1.26884303 # ... with 949 more rows </code></pre> </blockquote> Let's confirm both methods give the same output (in <code>dplyr</code> 0.7.0): <pre class="prettyprint"><code>new = dat %>% group_by_at(setdiff(names(dat), "value")) %>% summarise(meanValue=mean(value)) old = dat %>% group_by_(.dots=names(dat)[-grep("value", names(dat))]) %>% summarise(meanValue=mean(value)) identical(old, new) # [1] TRUE </code></pre>

How to group by all but one columns?

3 Answers

dplyr version 1.0+

In dplyr 1.0.0 coming up, the _at functions are falling into the superseded lifecycle (i.e. while they remain in dplyr for the foreseeable future, there are now better alternatives that are more actively developed). The new way to accomplish this is via the across function:

df %>%
  group_by(across(c(-hp)))

dplyr v 0.7+

A small update on this question because I stumbled across this myself and found an elegant solution with current version of dplyr (0.7.4): Inside group_by_at(), you can supply the names of columns the same way as in the select() function using vars(). This enables us to group by everything but one column (hp in this example) by writing:

library(dplyr)
df <- as_tibble(mtcars, rownames = "car")
df %>% group_by_at(vars(-hp))

answered Oct 05 '22 11:10

Jannik Buhr

Building on the @eipi10's dplyr 0.7.0 edit, group_by_at appears to be the right function for this job. However, if you are simply looking to omit column "x", then you can use:

new2.0 <- dat %>%
  group_by_at(vars(-x)) %>%
  summarize(mean_value = mean(value))

Using @eipi10's example data:

# Fake data
set.seed(492)
dat <- data.frame(value = rnorm(1000),
             g1 = sample(LETTERS, 1000, replace = TRUE),
             g2 = sample(letters, 1000, replace = TRUE),
             g3 = sample(1:10, replace = TRUE),
             other = sample(c("red", "green", "black"), 1000, replace = TRUE))

new <- dat %>% 
  group_by_at(names(dat)[-grep("value", names(dat))]) %>%
  summarise(meanValue = mean(value))


new2.0 <- dat %>% 
  group_by_at(vars(-value)) %>% 
  summarize(meanValue = mean(value))

identical(new, new2.0)
# [1] TRUE

answered Oct 05 '22 11:10

ZS27

You can do this using standard evaluation (group_by_ instead of group_by):

# Fake data
set.seed(492)
dat = data.frame(value=rnorm(1000), g1=sample(LETTERS,1000,replace=TRUE),
                 g2=sample(letters,1000,replace=TRUE), g3=sample(1:10, replace=TRUE),
                 other=sample(c("red","green","black"),1000,replace=TRUE))

dat %>% group_by_(.dots=names(dat)[-grep("value", names(dat))]) %>%
  summarise(meanValue=mean(value))

       g1     g2    g3  other   meanValue
   <fctr> <fctr> <int> <fctr>       <dbl>
1       A      a     2  green  0.89281475
2       A      b     2    red -0.03558775
3       A      b     5  black -1.79184218
4       A      c    10  black  0.17518610
5       A      e     5  black  0.25830392
...

See this vignette for more on standard vs. non-standard evaluation in dplyr.

UPDATE for `dplyr` 0.7.0

To address @ÖmerAn's comment: It looks like group_by_at is the way to go in dplyr 0.7.0 (someone please correct me if I'm wrong about this). For example:

dat %>% 
  group_by_at(setdiff(names(dat), "value")) %>%
  summarise(meanValue=mean(value))

# Groups:   g1, g2, g3 [?]
       g1     g2    g3  other   meanValue
   <fctr> <fctr> <int> <fctr>       <dbl>
 1      A      a     2  green  0.89281475
 2      A      b     2    red -0.03558775
 3      A      b     5  black -1.79184218
 4      A      c    10  black  0.17518610
 5      A      e     5  black  0.25830392
 6      A      e     5    red -0.81879788
 7      A      e     7  green  0.30836054
 8      A      f     2  green  0.05537047
 9      A      g     1  black  1.00156405
10      A      g    10  black  1.26884303
# ... with 949 more rows

Let's confirm both methods give the same output (in dplyr 0.7.0):

new = dat %>% 
  group_by_at(setdiff(names(dat), "value")) %>%
  summarise(meanValue=mean(value))

old = dat %>% 
  group_by_(.dots=names(dat)[-grep("value", names(dat))]) %>%
  summarise(meanValue=mean(value))

identical(old, new)
# [1] TRUE

answered Oct 05 '22 12:10

eipi10

Related questions
                            
                                How to rotate only text in annotation in ggplot?
                            
                                Rcpp package doesn't include Rcpp_precious_remove
                            
                                change both legend titles in a ggplot with two legends
                            
                                Extract random effect variances from lme4 mer model object
                            
                                Replace NA with previous or next value, by group, using dplyr
                            
                                A comprehensive survey of the types of things in R; 'mode' and 'class' and 'typeof' are insufficient
                            
                                R Markdown - variable output name
                            
                                How do you order the fill-colours within ggplot2 geom_bar
                            
                                Delete rows that exist in another data frame? [duplicate]
                            
                                How to create a list with names but no entries in R/Splus?
                            
                                Sum all values in every column of a data.frame in R
                            
                                Convert NA into a factor level
                            
                                Superscript and subscript axis labels in ggplot2 [duplicate]
                            
                                Is there a string formatting operator in R similar to Python's %?
                            
                                subtract value from previous row by group
                            
                                Speeding up the performance of write.table
                            
                                Add text to ggplot
                            
                                Plot with conditional colors based on values in R [duplicate]
                            
                                Extract names of objects from list
                            
                                error: --with-readline=yes (default) and headers/libs are not available

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to group by all but one columns?

Tags:

r

dplyr

Roman Cheplyaka

People also ask

3 Answers

dplyr version 1.0+

dplyr v 0.7+

Jannik Buhr

ZS27

UPDATE for `dplyr` 0.7.0

eipi10

Recent Activity

Donate For Us

How to group by all but one columns?

Tags:

r

dplyr

Roman Cheplyaka

People also ask

3 Answers

dplyr version 1.0+

dplyr v 0.7+

Jannik Buhr

ZS27

UPDATE for dplyr 0.7.0

eipi10

Related questions

Recent Activity

Donate For Us

UPDATE for `dplyr` 0.7.0