I do this a lot:
library(tidyverse)
iris %>%
group_by(Species) %>%
summarise(num_Species = n_distinct(Species)) %>%
mutate(perc_Species = 100 * num_Species / sum(num_Species))
So I would like to create a function that outputs the same thing but with dynamically named num_ and perc_ columns:
num_perc <- function(df, group_var, summary_var) {
}
I found this resource useful but it did not directly address how to reuse newly created column names in the way I want.
What you can do is use as_label(enquo()) on your group_var to extract variable passed as a character vector to generate your new columns. You can see a clear example of this is 6.1.3 in the linked document you sent. In this way, we can dynamically prepend num_ and perc_ to your summary variable, and just have to pass in df and group_var.
library(dplyr)
num_perc <- function(df, group_var) {
summary_lbl <- as_label(enquo(group_var))
num_lbl <- paste0("num_", summary_lbl)
perc_lbl <- paste0("perc_", summary_lbl)
df %>%
group_by({{ group_var }}) %>%
summarize(!!num_lbl := n_distinct({{ group_var }})) %>%
mutate(!!perc_lbl := 100 * .data[[num_lbl]] / sum(.data[[num_lbl]]))
}
num_perc(iris, Species)
#> # A tibble: 3 × 3
#> Species num_Species perc_Species
#> <fct> <int> <dbl>
#> 1 setosa 1 33.3
#> 2 versicolor 1 33.3
#> 3 virginica 1 33.3
In this case where group_var and summary_var actually differ, it's the same solution essentially.
num_perc <- function(df, group_var, summary_var) {
summary_lbl <- as_label(enquo(summary_var))
num_lbl <- paste0("num_", summary_lbl)
perc_lbl <- paste0("perc_", summary_lbl)
df %>%
group_by({{ group_var }}) %>%
summarize(!!num_lbl := n_distinct({{ summary_var }})) %>%
mutate(!!perc_lbl := 100 * .data[[num_lbl]] / sum(.data[[num_lbl]]))
}
num_perc(iris, Species, Species)
Another possible solution, which uses deparse(substitute(...)) to get the name of the function parameters as strings:
library(tidyverse)
f <- function(df, group_var, summary_var)
{
group_var <- deparse(substitute(group_var))
summary_var <- deparse(substitute(summary_var))
df %>%
group_by(!!sym(group_var)) %>%
summarise(!!str_c("num_", summary_var) := n_distinct(summary_var)) %>%
mutate(!!str_c("per_", summary_var) := 100 * !!sym(str_c("num_", summary_var)) / sum(!!sym(str_c("num_", summary_var))))
}
f(iris, Species, Species)
#> # A tibble: 3 × 3
#> Species num_Species per_Species
#> <fct> <int> <dbl>
#> 1 setosa 1 33.3
#> 2 versicolor 1 33.3
#> 3 virginica 1 33.3
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With