I'm trying to group_by multiple columns in my data frame and I can't write out every single column name in the group_by function so I want to call the column names as a vector like so:
cols <- colnames(mtcars)[grep("[a-z]{3,}$", colnames(mtcars))]
mtcars %>% filter(disp < 160) %>% group_by(cols) %>% summarise(n = n())
This returns error:
Error in mutate_impl(.data, dots) :
Column `mtcars[colnames(mtcars)[grep("[a-z]{3,}$", colnames(mtcars))]]` must be length 12 (the number of rows) or one, not 7
I definitely want to use a dplyr function to do this, but can't figure this one out.
group_by() takes an existing tbl and converts it into a grouped tbl where operations are performed "by group".
The group_by() method is used to group the data contained in the data frame based on the columns specified as arguments to the function call.
Group_by() function belongs to the dplyr package in the R programming language, which groups the data frames. Group_by() function alone will not give any output.
You can use group_by_at
, where you can pass a character vector of column names as group variables:
mtcars %>%
filter(disp < 160) %>%
group_by_at(cols) %>%
summarise(n = n())
# A tibble: 12 x 8
# Groups: mpg, cyl, disp, drat, qsec, gear [?]
# mpg cyl disp drat qsec gear carb n
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
# 1 19.7 6 145.0 3.62 15.50 5 6 1
# 2 21.4 4 121.0 4.11 18.60 4 2 1
# 3 21.5 4 120.1 3.70 20.01 3 1 1
# 4 22.8 4 108.0 3.85 18.61 4 1 1
# ...
Or you can move the column selection inside group_by_at
using vars
and column select helper functions:
mtcars %>%
filter(disp < 160) %>%
group_by_at(vars(matches('[a-z]{3,}$'))) %>%
summarise(n = n())
# A tibble: 12 x 8
# Groups: mpg, cyl, disp, drat, qsec, gear [?]
# mpg cyl disp drat qsec gear carb n
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
# 1 19.7 6 145.0 3.62 15.50 5 6 1
# 2 21.4 4 121.0 4.11 18.60 4 2 1
# 3 21.5 4 120.1 3.70 20.01 3 1 1
# 4 22.8 4 108.0 3.85 18.61 4 1 1
# ...
I believe group_by_at
has now been superseded by using a combination of group_by
and across
. And summarise
has an experimental .groups
argument where you can choose how to handle the grouping after you create a summarised object. Here is an alternative to consider:
cols <- colnames(mtcars)[grep("[a-z]{3,}$", colnames(mtcars))]
original <- mtcars %>%
filter(disp < 160) %>%
group_by_at(cols) %>%
summarise(n = n())
superseded <- mtcars %>%
filter(disp < 160) %>%
group_by(across(all_of(cols))) %>%
summarise(n = n(), .groups = 'drop_last')
all.equal(original, superseded)
Here is a blog post that goes into more detail about using the across
function:
https://www.tidyverse.org/blog/2020/04/dplyr-1-0-0-colwise/
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With