Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

dplyr - groupby on multiple columns using variable names

I am working with R Shiny for some exploratory data analysis. I have two checkbox inputs that contain only the user-selected options. The first checkbox input contains only the categorical variables; the second checkbox contains only numeric variables. Next, I apply a groupby on these two selections:

var1 <- input$variable1      # Checkbox with categorical variables var2 <- input$variable2      # Checkbox with numerical variables  v$data <- dataset %>%   group_by_(var1) %>%   summarize_(Sum = interp(~sum(x), x = as.name(var2))) %>%   arrange(desc(Sum)) 

When only one categorical variable is selected, this groupby works perfectly. When multiple categorical variables are chosen, this groupby returns an array with column names. How do I pass this array of column names to dplyr's groupby?

like image 216
Neil Avatar asked Dec 28 '15 04:12

Neil


People also ask

Can you group by multiple columns in dplyr?

The group_by() method is used to group the data contained in the data frame based on the columns specified as arguments to the function call.

What does %>% do in dplyr?

%>% is called the forward pipe operator in R. It provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. It is defined by the package magrittr (CRAN) and is heavily used by dplyr (CRAN).

How do I group categorical variables in R?

When working with categorical variables, you may use the group_by() method to divide the data into subgroups based on the variable's distinct categories. You can group by a single variable or by giving in multiple variable names to group by several variables.


1 Answers

dplyr version >1.0

With more recent versions of dplyr, you should use across along with a tidyselect helper function. See help("language", "tidyselect") for a list of all the helper functions. In this case if you want all columns in a character vector, use all_of()

cols <- c("mpg","hp","wt") mtcars %>%     group_by(across(all_of(cols))) %>%     summarize(x=mean(gear)) 

original answer (older versions of dplyr)

If you have a vector of variable names, you should pass them to the .dots= parameter of group_by_. For example:

mtcars %>%     group_by_(.dots=c("mpg","hp","wt")) %>%     summarize(x=mean(gear)) 
like image 141
MrFlick Avatar answered Oct 01 '22 01:10

MrFlick