Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Userfunction with optional grouping argument and if else using piping in R

Tags:

r

I recently started to write my own functions to speed up standard and repetitive task while analyzing data with R.

At the moment I'm working on a function with three arguments and ran into a challenge I could not solve yet. I would like to have an optional grouping argument. During the process the function should check if there is a grouping argument and then continue using either subfunction 1 or 2.

But I always get the error "Object not found" if the grouping argument is not NA. How can I do this?

Edit: In my case the filter usually is used to filter certain valid or invalid years. If there is a grouping argument there will follow more steps in the pipe than if there is none.

require(tidyverse)

Data <- mpg

userfunction <- function(DF,Filter,Group) {
  
  without_group <- function(DF) {
    DF %>% 
      count(year)
  }
  
  with_group <- function(DF) {
    DF %>% 
      group_by({{Group}}) %>% 
      count(year) %>% 
      pivot_wider(names_from=year, values_from=n) %>%
      ungroup() %>% 
      mutate(across(.cols=2:ncol(.),.fns=~replace_na(.x, 0))) %>% 
      mutate(Mittelwert=round(rowMeans(.[,2:ncol(.)],na.rm=TRUE),2))
  }
  
  Obj <- DF %>% 
    ungroup() %>% 
    {if(Filter!=FALSE) filter(.,eval(rlang::parse_expr(Filter))) else filter(.,.$year==.$year)} %>%
    {if(is.na(Group)) without_group(.) else with_group(.)} 
  
  return(Obj)
    
}

For NA it already works:

> Data %>% 
+   userfunction(FALSE,NA)
# A tibble: 2 x 2
   year     n
  <int> <int>
1  1999   117
2  2008   117

With argument it does not work:

> Data %>% 
+   userfunction(FALSE,manufacturer)
 Error in DF %>% ungroup() %>% { : object 'manufacturer' not found

Edit: What I would expect from the above function would be the following output:

> Data %>% userfunction_exp(FALSE,manufacturer)
# A tibble: 15 x 4
   manufacturer `1999` `2008` Mittelwert
   <chr>         <dbl>  <dbl>      <dbl>
 1 audi              9      9        9  
 2 chevrolet         7     12        9.5
 3 dodge            16     21       18.5
 4 ford             15     10       12.5
 5 honda             5      4        4.5
 6 hyundai           6      8        7  
 7 jeep              2      6        4  
 8 land rover        2      2        2  
 9 lincoln           2      1        1.5
10 mercury           2      2        2  
11 nissan            6      7        6.5
12 pontiac           3      2        2.5
13 subaru            6      8        7  
14 toyota           20     14       17  
15 volkswagen       16     11       13.5

 Data %>% userfunction_exp("cyl==4",manufacturer)
# A tibble: 9 x 4
  manufacturer `1999` `2008`  mean
  <chr>         <dbl>  <dbl> <dbl>
1 audi              4      4   4  
2 chevrolet         1      1   1  
3 dodge             1      0   0.5
4 honda             5      4   4.5
5 hyundai           4      4   4  
6 nissan            2      2   2  
7 subaru            6      8   7  
8 toyota           11      7   9  
9 volkswagen       11      6   8.5

2021-04-01 14:55: edited to add some information and add some steps to the pipe for function with_group.

like image 321
thuettel Avatar asked Nov 30 '25 18:11

thuettel


2 Answers

I don't know what is the use of Filter argument so I'll keep it as it is for now.

group_by(A) %>% count(B) is same as count(A, B) so you can change your function to :

library(tidyverse)

userfunction <- function(DF,Filter,Group = NULL) {
  DF %>% 
    count(year, {{Group}}) %>% 
    pivot_wider(names_from=year, values_from=n)
}

Data %>% userfunction(FALSE)

#   `1999` `2008`
#   <int>  <int>
#1    117    117

Data %>% userfunction(FALSE,manufacturer)
# A tibble: 15 x 3
#   manufacturer `1999` `2008`
#   <chr>         <int>  <int>
# 1 audi              9      9
# 2 chevrolet         7     12
# 3 dodge            16     21
# 4 ford             15     10
# 5 honda             5      4
# 6 hyundai           6      8
# 7 jeep              2      6
# 8 land rover        2      2
# 9 lincoln           2      1
#10 mercury           2      2
#11 nissan            6      7
#12 pontiac           3      2
#13 subaru            6      8
#14 toyota           20     14
#15 volkswagen       16     11

Note that I have assigned the default value to Group as NULL so when you don't mention anything it ignores that argument.

like image 83
Ronak Shah Avatar answered Dec 02 '25 08:12

Ronak Shah


Hi this is a good question!

There are multiple ways to achieve this as the previous answers pointed out. One way to do it in the tidyverse is tidy evaluation

Omitting your filter function (which you could explain in more detail...)

 my_summary <- function(df, grouping_var) {
  grp_var <- enquo(grouping_var) #capture group variable
  df %>% my_group_by(grp_var)
}


my_group_by <- function(df, grouping_var){
  # Check if group is supplied 
  if(rlang::quo_is_missing(grouping_var)) {
    df %>% without_group()
  } else {
    df %>% with_group(grouping_var)
  }
  
}


without_group <- function(df) {
  # do whatever without group
  df %>% 
    count(year)
}

with_group <- function(df, grouping_var) {
  # do whatever with group
  df %>% 
    group_by(!!grouping_var) %>% #Note the !!
    count(year) %>% 
    pivot_wider(names_from=year, values_from=n)
}

Which will give you without any argument

> mpg %>% my_summary()
# A tibble: 2 x 2
   year     n
  <int> <int>
1  1999   117
2  2008   117

With group passed to pipe

> mpg %>% my_summary(model)
# A tibble: 38 x 3
# Groups:   model [38]
   model              `1999` `2008`
   <chr>               <int>  <int>
 1 4runner 4wd             4      2
 2 a4                      4      3
 3 a4 quattro              4      4
 4 a6 quattro              1      2
 5 altima                  2      4
 6 c1500 suburban 2wd      1      4
 7 camry                   4      3
 8 camry solara            4      3
 9 caravan 2wd             6      5
10 civic                   5      4
# ... with 28 more rows
like image 33
SEcker Avatar answered Dec 02 '25 10:12

SEcker