Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

dplyr::group_by_ with character string input of several variable names

Tags:

I'm writing a function where the user is asked to define one or more grouping variables in the function call. The data is then grouped using dplyr and it works as expected if there is only one grouping variable, but I haven't figured out how to do it with multiple grouping variables.

Example:

x <- c("cyl") y <- c("cyl", "gear") dots <- list(~cyl, ~gear)  library(dplyr) library(lazyeval)   mtcars %>% group_by_(x)             # groups by cyl mtcars %>% group_by_(y)             # groups only by cyl (not gear) mtcars %>% group_by_(.dots = dots)  # groups by cyl and gear, this is what I want. 

I tried to turn y into the same as dots using:

mtcars %>% group_by_(.dots = interp(~var, var = list(y))) #Error: is.call(expr) || is.name(expr) || is.atomic(expr) is not TRUE 

How to use a user-defined input string of > 1 variable names (like y in the example) to group the data using dplyr?

(This question is somehow related to this one but not answered there.)

like image 528
talat Avatar asked Dec 29 '14 11:12

talat


People also ask

How do I group multiple variables in R?

Group By Multiple Columns in R using dplyrUse group_by() function in R to group the rows in DataFrame by multiple columns (two or more), to use this function, you have to install dplyr first using install. packages('dplyr') and load it using library(dplyr) . All functions in dplyr package take data.

What does the group_by function do in R?

Group_by() function belongs to the dplyr package in the R programming language, which groups the data frames. Group_by() function alone will not give any output. It should be followed by summarise() function with an appropriate action to perform. It works similar to GROUP BY in SQL and pivot table in excel.

What does group by do in Dplyr?

group_by() takes an existing tbl and converts it into a grouped tbl where operations are performed "by group". ungroup() removes grouping.


2 Answers

No need for interp here, just use as.formula to convert the strings to formulas:

dots = sapply(y, . %>% {as.formula(paste0('~', .))}) mtcars %>% group_by_(.dots = dots) 

The reason why your interp approach doesn’t work is that the expression gives you back the following:

~list(c("cyl", "gear")) 

– not what you want. You could, of course, sapply interp over y, which would be similar to using as.formula above:

dots1 = sapply(y, . %>% {interp(~var, var = .)}) 

But, in fact, you can also directly pass y:

mtcars %>% group_by_(.dots = y) 

The dplyr vignette on non-standard evaluation goes into more detail and explains the difference between these approaches.

like image 84
Konrad Rudolph Avatar answered May 08 '23 21:05

Konrad Rudolph


slice_rows() from the purrrlyr package (https://github.com/hadley/purrrlyr) groups a data.frame by taking a vector of column names (strings) or positions (integers):

y <- c("cyl", "gear") mtcars_grp <- mtcars %>% purrrlyr::slice_rows(y)  class(mtcars_grp) #> [1] "grouped_df" "tbl_df"     "tbl"        "data.frame"  group_vars(mtcars_grp) #> [1] "cyl"  "gear" 

Particularly useful now that group_by_() has been depreciated.

like image 43
wjchulme Avatar answered May 08 '23 21:05

wjchulme