I'm trying to group_by multiple columns in my data frame and I can't write out every single column name in the group_by function so I want to call the column names as a vector like so: <pre class="prettyprint"><code>cols <- colnames(mtcars)[grep("[a-z]{3,}$", colnames(mtcars))] mtcars %>% filter(disp < 160) %>% group_by(cols) %>% summarise(n = n()) </code></pre> This returns error: <pre class="prettyprint"><code>Error in mutate_impl(.data, dots) : Column `mtcars[colnames(mtcars)[grep("[a-z]{3,}$", colnames(mtcars))]]` must be length 12 (the number of rows) or one, not 7 </code></pre> I definitely want to use a dplyr function to do this, but can't figure this one out.

You can use <code>group_by_at</code>, where you can pass a character vector of column names as group variables: <pre class="prettyprint"><code>mtcars %>% filter(disp < 160) %>% group_by_at(cols) %>% summarise(n = n()) # A tibble: 12 x 8 # Groups: mpg, cyl, disp, drat, qsec, gear [?] # mpg cyl disp drat qsec gear carb n # <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> # 1 19.7 6 145.0 3.62 15.50 5 6 1 # 2 21.4 4 121.0 4.11 18.60 4 2 1 # 3 21.5 4 120.1 3.70 20.01 3 1 1 # 4 22.8 4 108.0 3.85 18.61 4 1 1 # ... </code></pre> Or you can move the column selection inside <code>group_by_at</code> using <code>vars</code> and column select helper functions: <pre class="prettyprint"><code>mtcars %>% filter(disp < 160) %>% group_by_at(vars(matches('[a-z]{3,}$'))) %>% summarise(n = n()) # A tibble: 12 x 8 # Groups: mpg, cyl, disp, drat, qsec, gear [?] # mpg cyl disp drat qsec gear carb n # <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> # 1 19.7 6 145.0 3.62 15.50 5 6 1 # 2 21.4 4 121.0 4.11 18.60 4 2 1 # 3 21.5 4 120.1 3.70 20.01 3 1 1 # 4 22.8 4 108.0 3.85 18.61 4 1 1 # ... </code></pre>

dplyr group by colnames described as vector of strings

Tags:

r

dplyr

I'm trying to group_by multiple columns in my data frame and I can't write out every single column name in the group_by function so I want to call the column names as a vector like so:

cols <- colnames(mtcars)[grep("[a-z]{3,}$", colnames(mtcars))]
mtcars %>% filter(disp < 160) %>% group_by(cols) %>% summarise(n = n())

This returns error:

Error in mutate_impl(.data, dots) : 
  Column `mtcars[colnames(mtcars)[grep("[a-z]{3,}$", colnames(mtcars))]]` must be length 12 (the number of rows) or one, not 7

I definitely want to use a dplyr function to do this, but can't figure this one out.

995

asked Dec 20 '17 18:12

conv3d

2 Answers

You can use group_by_at, where you can pass a character vector of column names as group variables:

mtcars %>% 
    filter(disp < 160) %>% 
    group_by_at(cols) %>% 
    summarise(n = n())
# A tibble: 12 x 8
# Groups:   mpg, cyl, disp, drat, qsec, gear [?]
#     mpg   cyl  disp  drat  qsec  gear  carb     n
#   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
# 1  19.7     6 145.0  3.62 15.50     5     6     1
# 2  21.4     4 121.0  4.11 18.60     4     2     1
# 3  21.5     4 120.1  3.70 20.01     3     1     1
# 4  22.8     4 108.0  3.85 18.61     4     1     1
# ...

Or you can move the column selection inside group_by_at using vars and column select helper functions:

mtcars %>% 
    filter(disp < 160) %>% 
    group_by_at(vars(matches('[a-z]{3,}$'))) %>% 
    summarise(n = n())

# A tibble: 12 x 8
# Groups:   mpg, cyl, disp, drat, qsec, gear [?]
#     mpg   cyl  disp  drat  qsec  gear  carb     n
#   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
# 1  19.7     6 145.0  3.62 15.50     5     6     1
# 2  21.4     4 121.0  4.11 18.60     4     2     1
# 3  21.5     4 120.1  3.70 20.01     3     1     1
# 4  22.8     4 108.0  3.85 18.61     4     1     1
# ...

answered Oct 01 '22 07:10

Psidom

I believe group_by_at has now been superseded by using a combination of group_by and across. And summarise has an experimental .groups argument where you can choose how to handle the grouping after you create a summarised object. Here is an alternative to consider:

cols <- colnames(mtcars)[grep("[a-z]{3,}$", colnames(mtcars))]

original <- mtcars %>% 
  filter(disp < 160) %>% 
  group_by_at(cols) %>% 
  summarise(n = n())

superseded <- mtcars %>%
  filter(disp < 160) %>%
  group_by(across(all_of(cols))) %>%
  summarise(n = n(), .groups = 'drop_last')

all.equal(original, superseded)

Here is a blog post that goes into more detail about using the across function: https://www.tidyverse.org/blog/2020/04/dplyr-1-0-0-colwise/

answered Oct 01 '22 07:10

Harrison Jones

Related questions
                            
                                What does the integer while setting the seed mean?
                            
                                How can I turn an R data frame into a simple, unstyled html table?
                            
                                Plot confusion matrix in R using ggplot
                            
                                Remove everything after space in string
                            
                                Using apply function on a matrix with NA entries
                            
                                R shinyDashboard customize box status color
                            
                                ERROR: a 'NAMESPACE' file is required
                            
                                Force R to plot histogram as probability (relative frequency)
                            
                                Changing the maximum width of R markdown documents
                            
                                ggplot2 axis transformation by constant factor
                            
                                Regex return file name, remove path and file extension
                            
                                Hollow histogram or binning for geom_step
                            
                                Create grouping variable for consecutive sequences and split vector
                            
                                Using nnet for prediction, am i doing it right?
                            
                                What does the double percentage sign (%%) mean?
                            
                                lib unspecified & Error in loadNamespace
                            
                                Create a histogram for weighted values
                            
                                Using R from Scala and invoking Scala from R?
                            
                                print or display variable inside function
                            
                                How to create base R plot 'type = b' equivalent in ggplot2?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With