I recently understood how to access a column names inside a user defined function: How to access a column name in a user defined function with dplyr?
However, now I also need to access the column names within the operations that are being carried out. For example I would like to do this:
samp_df <- tibble(var1 = c('a', 'b', 'c'),
var_in_df = c(3,7,9))
calculateSummaries <- function(df, variable){
df <- df %>%
mutate("mean_of_{{variable}}" := mean({{variable}}),
"sd_of_{{variable}}" := sd({{variable}}),
"sd_plus_mean_of_{{variable}}" := ("mean_of_{{variable}}" + "sd_of_{{variable}}")
)
}
df_result <- calculateSummaries(samp_df, var_in_df)
Of course I could do:
"sd_plus_mean_of_{{variable}}" := mean({{variable}}) + sd({{variable}})
But in practice, with the real data this won't be practical.
Does anyone know how to so this?
This case ineed a little bit tricky, I think we have to constuct the names first and then use !! sym() to evaluate the strings as objects.
library(dplyr)
samp_df <- tibble(var1 = c('a', 'b', 'c'),
var_in_df = c(3,7,9))
calculateSummaries <- function(df, variable){
var_nm <- deparse(substitute(variable))
mean_var_nm <- paste0("mean_of_", var_nm)
sd_var_nm <- paste0("sd_of_", var_nm)
df %>%
mutate("mean_of_{{variable}}" := mean({{variable}}),
"sd_of_{{variable}}" := sd({{variable}}),
"sd_plus_mean_of_{{variable}}" := !! sym(mean_var_nm) + !! sym(sd_var_nm)
)
}
calculateSummaries(samp_df, var_in_df)
#> # A tibble: 3 x 5
#> var1 var_in_df mean_of_var_in_df sd_of_var_in_df sd_plus_mean_of_var_in_df
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 a 3 6.33 3.06 9.39
#> 2 b 7 6.33 3.06 9.39
#> 3 c 9 6.33 3.06 9.39
An alternative way is using across(), but we still have to construct the variable names.
calculateSummaries <- function(df, variable){
df %>%
mutate("mean_of_{{variable}}" := mean({{variable}}),
"sd_of_{{variable}}" := sd({{variable}}),
across(c({{ variable }}),
list(sd_plus_mean_of = ~ get(paste0("mean_of_", cur_column())) + get(paste0("sd_of_", cur_column())))
)
)
}
calculateSummaries(samp_df, var_in_df)
#> # A tibble: 3 x 5
#> var1 var_in_df mean_of_var_in_df sd_of_var_in_df var_in_df_sd_plus_mean_of
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 a 3 6.33 3.06 9.39
#> 2 b 7 6.33 3.06 9.39
#> 3 c 9 6.33 3.06 9.39
Here is a final way inspired by Lionel Henry's answer to this question. We can use rlang::englue() to construct names and use those names with the .data[[...]] pronoun.
calculateSummaries <- function(df, variable){
mean_var_nm <- rlang::englue("mean_of_{{ variable }}")
sd_var_nm <- rlang::englue("sd_of_{{ variable }}")
df %>%
mutate("mean_of_{{ variable }}" := mean({{ variable }}),
"sd_of_{{ variable }}" := sd({{ variable }}),
"sd_plus_mean_of_{{ variable }}" := .data[[mean_var_nm]] + .data[[sd_var_nm]]
)
}
calculateSummaries(samp_df, var_in_df)
#> # A tibble: 3 x 5
#> var1 var_in_df mean_of_var_in_df sd_of_var_in_df sd_plus_mean_of_var_in_df
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 a 3 6.33 3.06 9.39
#> 2 b 7 6.33 3.06 9.39
#> 3 c 9 6.33 3.06 9.39
Created on 2022-10-13 by the reprex package (v2.0.1)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With