Given a situation such as the following <pre class="prettyprint"><code>library(dplyr) myData <- tbl_df(data.frame( var1 = rnorm(100), var2 = letters[1:3] %>% sample(100, replace = TRUE) %>% factor(), var3 = LETTERS[1:3] %>% sample(100, replace = TRUE) %>% factor(), var4 = month.abb[1:3] %>% sample(100, replace = TRUE) %>% factor())) </code></pre> I would like to group `myData' to eventually find summary data grouping by all possible combinations of var2, var3, and var4. I can create a list with all possible combinations of variables as character values with <pre class="prettyprint"><code>groupNames <- names(myData)[2:4] myGroups <- Map(combn, list(groupNames), seq_along(groupNames), simplify = FALSE) %>% unlist(recursive = FALSE) </code></pre> My plan was to make separate data sets for each variable combination with a for() loop, something like <pre class="prettyprint"><code>### This Does Not Work for (i in 1:length(myGroups)){ assign( myGroups[i]%>% unlist() %>% paste0(collapse = "")%>% paste0("Data"), myData %>% group_by_(lapply(myGroups[[i]], as.symbol)) %>% summarise( n = length(var1), avgVar2 = var2 %>% mean())) } </code></pre> Admittedly I am not very good with lists, and looking up this issue was a bit challenging since dpyr updates have altered how grouping works a bit. If there is a better way to do this than separate data sets I would love to know. I've gotten a loop similar to above working when I am only grouping by a single variable. Any and all help is greatly appreciated! Thank you!

I have created a function based on the answer of @Gregor and the comments that followed: <pre class="prettyprint"><code>library(magrittr) myData <- tbl_df(data.frame( var1 = rnorm(100), var2 = letters[1:3] %>% sample(100, replace = TRUE) %>% factor(), var3 = LETTERS[1:3] %>% sample(100, replace = TRUE) %>% factor(), var4 = month.abb[1:3] %>% sample(100, replace = TRUE) %>% factor())) </code></pre> <h3>Function <code>combSummarise</code> </h3> <pre class="prettyprint"><code>combSummarise <- function(data, variables=..., summarise=...){ # Get all different combinations of selected variables (credit to @Michael) myGroups <- lapply(seq_along(variables), function(x) { combn(c(variables), x, simplify = FALSE)}) %>% unlist(recursive = FALSE) # Group by selected variables (credit to @konvas) df <- eval(parse(text=paste("lapply(myGroups, function(x){ dplyr::group_by_(data, .dots=x) %>% dplyr::summarize_( \"", paste(summarise, collapse="\",\""),"\")})"))) %>% do.call(plyr::rbind.fill,.) groupNames <- c(myGroups[[length(myGroups)]]) newNames <- names(df)[!(names(df) %in% groupNames)] df <- cbind(df[, groupNames], df[, newNames]) names(df) <- c(groupNames, newNames) df } </code></pre> <h3>Call of <code>combSummarise</code> </h3> <pre class="prettyprint"><code>combSummarise (myData, var=c("var2", "var3", "var4"), summarise=c("length(var1)", "mean(var1)", "max(var1)")) </code></pre> or <pre class="prettyprint"><code>combSummarise (myData, var=c("var2", "var4"), summarise=c("length(var1)", "mean(var1)", "max(var1)")) </code></pre> or <pre class="prettyprint"><code>combSummarise (myData, var=c("var2", "var4"), summarise=c("length(var1)")) </code></pre> etc

Inspired by the answers by Gregor and dimitris_ps, I wrote a dplyr style function that runs summarise for all combinations of group variables. <pre class="prettyprint"><code>summarise_combo <- function(data, ...) { groupVars <- group_vars(data) %>% map(as.name) groupCombos <- map( 0:length(groupVars), ~combn(groupVars, ., simplify=FALSE) ) %>% unlist(recursive = FALSE) results <- groupCombos %>% map(function(x) {data %>% group_by(!!! x) %>% summarise(...)} ) %>% bind_rows() results %>% select(!!! groupVars, everything()) } </code></pre> Example <pre class="prettyprint"><code>library(tidyverse) mtcars %>% group_by(cyl, vs) %>% summarise_combo(cyl_n = n(), mean(mpg)) </code></pre>

Grouping Over All Possible Combinations of Several Variables With dplyr

Tags:

r

dplyr

summary

Given a situation such as the following

library(dplyr)
myData <- tbl_df(data.frame( var1 = rnorm(100), 
                             var2 = letters[1:3] %>%
                                    sample(100, replace = TRUE) %>%
                                    factor(), 
                             var3 = LETTERS[1:3] %>%
                                    sample(100, replace = TRUE) %>%
                                    factor(), 
                             var4 = month.abb[1:3] %>%
                                    sample(100, replace = TRUE) %>%
                                    factor()))

I would like to group `myData' to eventually find summary data grouping by all possible combinations of var2, var3, and var4.

I can create a list with all possible combinations of variables as character values with

groupNames <- names(myData)[2:4]

myGroups <- Map(combn, 
              list(groupNames), 
              seq_along(groupNames),
              simplify = FALSE) %>%
              unlist(recursive = FALSE)

My plan was to make separate data sets for each variable combination with a for() loop, something like

### This Does Not Work
for (i in 1:length(myGroups)){
     assign( myGroups[i]%>%
             unlist() %>%
             paste0(collapse = "")%>%
             paste0("Data"), 
               myData %>% 
               group_by_(lapply(myGroups[[i]], as.symbol)) %>%
               summarise( n = length(var1), 
                             avgVar2 = var2 %>%
                                       mean()))
}

Admittedly I am not very good with lists, and looking up this issue was a bit challenging since dpyr updates have altered how grouping works a bit.

If there is a better way to do this than separate data sets I would love to know.

I've gotten a loop similar to above working when I am only grouping by a single variable.

Any and all help is greatly appreciated! Thank you!

271

asked Mar 11 '15 16:03

Michael

2 Answers

I have created a function based on the answer of @Gregor and the comments that followed:

library(magrittr)
myData <- tbl_df(data.frame( var1 = rnorm(100), 
                         var2 = letters[1:3] %>%
                                sample(100, replace = TRUE) %>%
                                factor(), 
                         var3 = LETTERS[1:3] %>%
                                sample(100, replace = TRUE) %>%
                                factor(), 
                         var4 = month.abb[1:3] %>%
                                sample(100, replace = TRUE) %>%
                                factor()))

Function `combSummarise`

combSummarise <- function(data, variables=..., summarise=...){


  # Get all different combinations of selected variables (credit to @Michael)
    myGroups <- lapply(seq_along(variables), function(x) {
    combn(c(variables), x, simplify = FALSE)}) %>%
    unlist(recursive = FALSE)

  # Group by selected variables (credit to @konvas)
    df <- eval(parse(text=paste("lapply(myGroups, function(x){
               dplyr::group_by_(data, .dots=x) %>% 
               dplyr::summarize_( \"", paste(summarise, collapse="\",\""),"\")})"))) %>% 
          do.call(plyr::rbind.fill,.)

    groupNames <- c(myGroups[[length(myGroups)]])
    newNames <- names(df)[!(names(df) %in% groupNames)]

    df <- cbind(df[, groupNames], df[, newNames])
    names(df) <- c(groupNames, newNames)
    df

}

Call of `combSummarise`

combSummarise (myData, var=c("var2", "var3", "var4"), 
               summarise=c("length(var1)", "mean(var1)", "max(var1)"))

combSummarise (myData, var=c("var2", "var4"), 
               summarise=c("length(var1)", "mean(var1)", "max(var1)"))

combSummarise (myData, var=c("var2", "var4"), 
           summarise=c("length(var1)"))

etc

173

answered Oct 20 '22 17:10

dimitris_ps

Inspired by the answers by Gregor and dimitris_ps, I wrote a dplyr style function that runs summarise for all combinations of group variables.

summarise_combo <- function(data, ...) {

  groupVars <- group_vars(data) %>% map(as.name)

  groupCombos <-  map( 0:length(groupVars), ~combn(groupVars, ., simplify=FALSE) ) %>%
    unlist(recursive = FALSE)

  results <- groupCombos %>% 
    map(function(x) {data %>% group_by(!!! x) %>% summarise(...)} ) %>%
    bind_rows()

  results %>% select(!!! groupVars, everything())
}

Example

library(tidyverse)
mtcars %>% group_by(cyl, vs) %>% summarise_combo(cyl_n = n(), mean(mpg))

answered Oct 20 '22 15:10

Sanghoon

Related questions
                            
                                Legends in R plots
                            
                                Plotting data against time in R
                            
                                Subset R data frame contingent on the value of duplicate variables
                            
                                how to install R packages "RNetCDF" and "ncdf" on Ubuntu?
                            
                                Producing numeric sequences in R using standard patterns
                            
                                Paste together each pair of columns in a data frame in R?
                            
                                Delete a period and a number at the end of a character string
                            
                                Increasing the legend range in geom_tile manually
                            
                                How do you draw a boxplot without specifying x axis?
                            
                                How to convert factor to numeric in R without NAs introduced by coercion warning message
                            
                                Filling data frame with previous row value
                            
                                R: How to select files in directory which satisfy conditions both on the beginning and end of name?
                            
                                Error in as.double(y) : cannot coerce type 'S4' to vector of type 'double'
                            
                                Classic case of `sum` returning NA because it doesn't sum NAs [closed]
                            
                                how to convert longitude from 0 - 360 to -180 - 180
                            
                                Writing multiple data frames into .csv files using R
                            
                                How to access values in a frequency table
                            
                                parsing html containing &nbsp; (non-breaking space)
                            
                                Symbol size in ggplot: scale_size_manual doesn't work
                            
                                Logarithmic grid for plot with 'ggplot2'

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Grouping Over All Possible Combinations of Several Variables With dplyr

Tags:

r

dplyr

summary

Michael

People also ask

2 Answers

Function `combSummarise`

Call of `combSummarise`

dimitris_ps

Sanghoon

Recent Activity

Donate For Us

Grouping Over All Possible Combinations of Several Variables With dplyr

Tags:

r

dplyr

summary

Michael

People also ask

2 Answers

Function combSummarise

Call of combSummarise

dimitris_ps

Sanghoon

Related questions

Recent Activity

Donate For Us

Function `combSummarise`

Call of `combSummarise`