Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I write a tidyverse-friendly function that respects group_by() earlier in the pipe?

Tags:

r

dplyr

tidyr

I've started working on writing functions to make table generation quicker, but want to make the function respect earlier grouping choices made by the user in the pipe.

Example data:

df<-data.frame(ID=c("A","B","C","A","C","D","A","C","E","B","C","A"),
           Year=c(1,1,1,2,2,2,3,3,3,4,4,4),
           Credits=c(1,3,4,5,6,7,2,1,1,6,1,2),
           Major=c("GS","GS","LA","GS","GS","LA","GS","LA","LA","GS","LA","LA"),
           Status=c("green","blue","green","blue","green","blue","green","blue","green","blue","green","blue"),
           Group=c("Art","Music","Science","Art","Music","Science","Art","Music","Science","Art","Music","Science"))

The following is the function I'm working on, and it requires/accepts a variable to define cohorts, a credit variable, and a term variable.

table_headsfte_cohorts<-function(.data,cohortvar,credits,term){


  cohortvar<-rlang::ensym(cohortvar)
  credits<-rlang::ensym(credits)
  term<-rlang::ensym(term)


  .data%>%
    group_by(!!term,Pidm)%>%
    group_by(!!term,!!cohortvar,group_cols())%>%
    mutate(on3=1)%>%
    mutate(`Headcount`=sum(on3),
          `FTE`=round(sum(na.omit(!!credits))/15,1))%>%
    mutate(Variable=paste0(cohortvar))%>%
    mutate(Category=!!cohortvar)%>%
    select(-!!cohortvar)%>%
    select(Variable,Category,Headcount,FTE,group_cols())
}

For a user that may be interested in using additional grouping variables beyond the cohort variable they choose, I am hoping that the end result function would allow usage as follows:

df2<-df%>%
 group_by(Status,Group)%>%
 table_headsfte_cohorts(Major,Credits,Year)

The desired end result would be a table that respects and preserves the levels of the two grouping variables in the group_by statement above in addition to the cohortvar and term columns coming from the table_headsfte_cohorts() arguments.

I need to generate this same table, but for a wide range of grouping variables, and varying numbers of grouping variables, so flexibility would be very helpful.

Edit:

The following seems to get close, by at least allowing multiple grouping variables. This isn't quite what I'm hoping for, as I'd prefer that the additional grouping arguments are read from up the pipe:

 table_headsfte_cohorts<-function(.data,cohortvar,credits,term,...){

  grps<-enquos(...)

  cohortvar<-rlang::ensym(cohortvar)
  credits<-rlang::ensym(credits)
  term<-rlang::ensym(term)


  .data%>%
      group_by(!!term,!!cohortvar,!!! grps)%>%
     mutate(on3=1)%>%
     mutate(`Headcount`=sum(on3),
          `FTE`=round(sum(na.omit(!!credits))/15,1))%>%
     mutate(Variable=paste0(cohortvar))%>%
     mutate(Category=!!cohortvar)%>%
     select(-!!cohortvar)%>%
     select(Variable,Category,Headcount,FTE,!!!grps)

}

Using the above, I can successfully run:

fdfout<-fdf%>%
table_headsfte_cohorts(Major, Credits, Year), getting:

enter image description here

and I can also pass the other variables to the function to serve as additional grouping variables:

fdfout_alt<-fdf%>%
  table_headsfte_cohorts(Major,Credits,Year,Status,Group)

yielding the desired result:

enter image description here

Unfortunately, when I use

fdf_no<-fdf%>%
  group_by(Status, Group)%>%
  table_headsfte_cohorts(Major, Credits, Year)

I get:

enter image description here

This output would likely confuse someone using my function, as their group_by() line seems to do nothing.

like image 741
Pake Avatar asked Aug 31 '25 03:08

Pake


1 Answers

I added some lines that will merge both, the existing grouping variables and the new grouping variables inside the dots into one character vector. We can get the existing grouping variablers with group_vars. To merge old and new together we have to get the expression get_expr of the quoted grouping variables and turn them into strings. We can use !!! syms to evaluate and all_of to select the grouping variables.

Is this what you had in mind?

table_headsfte_cohorts <- function(.data, cohortvar, credits, term, ...){
  
  new_grps <- enquos(...)
  new_grps <- purrr::map_chr(new_grps, ~ as.character(rlang::get_expr(.x)))
  ex_grps  <- group_vars(.data)
  grp_vars <- c(ex_grps, new_grps)

  cohortvar<-rlang::ensym(cohortvar)
  credits<-rlang::ensym(credits)
  term<-rlang::ensym(term)
  
  
  .data%>%
    group_by(!! term,
             !! cohortvar,
             !!! syms(grp_vars))%>%
    mutate(on3 = 1) %>%
    mutate(`Headcount`= sum(on3),
           `FTE`= round(sum(na.omit(!!credits))/15,1))%>%
    mutate(Variable=paste0(cohortvar))%>%
    mutate(Category=!!cohortvar)%>%
    select(-!!cohortvar)%>%
    select(Variable,Category,Headcount,FTE, all_of(grp_vars))
  
}

df %>%
  group_by(Status, Group) %>%
  table_headsfte_cohorts(Major, Credits, Year)

#> Adding missing grouping variables: `Major`
#> Adding missing grouping variables: `Year`, `Major`
#> # A tibble: 12 x 8
#> # Groups:   Year, Major, Status, Group [12]
#>     Year Major Variable Category Headcount   FTE Status Group  
#>    <dbl> <chr> <chr>    <chr>        <dbl> <dbl> <chr>  <chr>  
#>  1     1 GS    Major    GS               1   0.1 green  Art    
#>  2     1 GS    Major    GS               1   0.2 blue   Music  
#>  3     1 LA    Major    LA               1   0.3 green  Science
#>  4     2 GS    Major    GS               1   0.3 blue   Art    
#>  5     2 GS    Major    GS               1   0.4 green  Music  
#>  6     2 LA    Major    LA               1   0.5 blue   Science
#>  7     3 GS    Major    GS               1   0.1 green  Art    
#>  8     3 LA    Major    LA               1   0.1 blue   Music  
#>  9     3 LA    Major    LA               1   0.1 green  Science
#> 10     4 GS    Major    GS               1   0.4 blue   Art    
#> 11     4 LA    Major    LA               1   0.1 green  Music  
#> 12     4 LA    Major    LA               1   0.1 blue   Science
like image 159
TimTeaFan Avatar answered Sep 02 '25 16:09

TimTeaFan