Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Programmatically dropping a `group_by` field in dplyr

Tags:

r

dplyr

I'm writing functions that take in a data.frame and then do some operations. I need to add and subtract items from the group_by criteria in order to get where I want to go.

If I want to add a group_by criteria to a df, that's pretty easy:

library(tidyverse)
set.seed(42)
n <- 10
input <- data.frame(a = 'a', 
                    b = 'b' , 
                    vals = 1
)

input %>%
  group_by(a) -> 
grouped 

grouped
#> # A tibble: 1 x 3
#> # Groups:   a [1]
#>   a     b      vals
#>   <fct> <fct> <dbl>
#> 1 a     b        1.

## add a group:
grouped %>% 
  group_by(b, add=TRUE)
#> # A tibble: 1 x 3
#> # Groups:   a, b [1]
#>   a     b      vals
#>   <fct> <fct> <dbl>
#> 1 a     b        1.

## drop a group?

But how do I programmatically drop the grouping by b which I added, yet keep all other groupings the same?

like image 444
JD Long Avatar asked May 08 '18 15:05

JD Long


People also ask

What does Group_by () do in R?

Groupby Function in R – group_by is used to group the dataframe in R. Dplyr package in R is provided with group_by() function which groups the dataframe by multiple columns with mean, sum and other functions like count, maximum and minimum.

What is the purpose of Group_by () function?

Most data operations are done on groups defined by variables. group_by() takes an existing tbl and converts it into a grouped tbl where operations are performed "by group". ungroup() removes grouping.

What is the difference between the Group_by and filter function in R?

GROUP BY enables you to use aggregate functions on groups of data returned from a query. FILTER is a modifier used on an aggregate function to limit the values used in an aggregation. All the columns in the select statement that aren't aggregated should be specified in a GROUP BY clause in the query.

Can you group by multiple columns in Dplyr?

The group_by() method is used to group the data contained in the data frame based on the columns specified as arguments to the function call.


1 Answers

Here's an approach that uses tidyeval so that bare column names can be used as the function arguments. I'm not sure if it makes sense to convert the bare column names to text (as I've done below) or if there's a more elegant way to work directly with the bare column names.

drop_groups = function(data, ...) {

  groups = map_chr(groups(data), rlang::quo_text)
  drop = map_chr(quos(...), rlang::quo_text)

  if(any(!drop %in% groups)) {
    warning(paste("Input data frame is not grouped by the following groups:", 
                  paste(drop[!drop %in% groups], collapse=", ")))
  }

  data %>% group_by_at(setdiff(groups, drop))

}

d = mtcars %>% group_by(cyl, vs, am)

groups(d %>% drop_groups(vs, cyl))
[[1]]
am
groups(d %>% drop_groups(a, vs, b, c))
[[1]]
cyl

[[2]]
am

Warning message:
In drop_groups(., a, vs, b, c) :
  Input data frame is not grouped by the following groups: a, b, c

UPDATE: The approach below works directly with quosured column names, without converting them to strings. I'm not sure which approach is "preferred" in the tidyeval paradigm, or whether there is yet another, more desirable method.

drop_groups2 = function(data, ...) {

  groups = map(groups(data), quo)
  drop = quos(...)

  if(any(!drop %in% groups)) {
    warning(paste("Input data frame is not grouped by the following groups:", 
                  paste(drop[!drop %in% groups], collapse=", ")))
  }

  data %>% group_by(!!!setdiff(groups, drop))

}
like image 104
eipi10 Avatar answered Nov 08 '22 08:11

eipi10