Consider a tibble where each column is a character vector which can take many values -- let's say "A" through "F".
library(tidyverse)
sample_df <- tibble(q1 = c("A", "B", "C"), q2 = c("B", "B", "A"))
I wish to create a function which takes a column name as an argument, and recodes that column so that any answer "A" becomes an NA and the df is otherwise returned as is. The reason for designing it this way is to fit into a broader pipeline that performs a series of operations using a given column.
There are many ways to do this. But I am interested in understanding what the best idiomatic tidy_eval/tidyverse approach would be. First, the question name needs to be on the left hand side of a mutate verb, so we use the !!
and :=
operators appropriately. But then, what to put on the right hand side?
fix_question <- function(df, question) {
df %>% mutate(!!question := recode(... something goes here...))
}
fix_question(sample_df, "q1") # should produce a tibble whose first column is (NA, "B", "C")
My initial thought was that this would work:
df %>% mutate(!!question := recode(!!question, "A" = NA_character_))
But of course the bang-bang on inside the function just returns the literal character string (e.g. "q1"). I ended up taking what feels like a hacky route to reference the data on the right hand side, using the base R [[
operator and relying on the .
construct from dplyr, and it works, so in a sense I have solved my underlying problem:
df %>% mutate(!!question := recode(.[[question]], "A" = NA_character_))
I'm interested in getting feedback from people who are very good at tidyeval as to whether there is a more idiomatic way to do this, in hopes that seeing a worked example would enhance my understanding of the tidyeval function set more generally. Any thoughts?
You can use the "curly curly" method now if you have rlang >= 0.4.0.
Explanation thanks to @eipi10:
This combines the two step process of quote-then-unquote into one step, so {{question}}
is equivalent to !!enquo(question)
fix_question <- function(df, question){
df %>% mutate({{question}} := recode({{question}}, A = NA_character_))
}
fix_question(sample_df, q1)
# # A tibble: 3 x 2
# q1 q2
# <chr> <chr>
# 1 NA B
# 2 B B
# 3 C A
Note that unlike the ensym
approach, this doesn't work with character names. Even worse, it does the wrong thing instead of just giving an error.
fix_question(sample_df, 'q1')
# # A tibble: 3 x 2
# q1 q2
# <chr> <chr>
# 1 q1 B
# 2 q1 B
# 3 q1 A
You can make the function a bit more flexible by allowing a vector of recoded values to be entered as an argument as well. For example:
library(tidyverse)
sample_df <- tibble(q1 = c("A", "B", "C"), q2 = c("B", "B", "A"))
fix_question <- function(df, question, recode.vec) {
df %>% mutate({{question}} := recode({{question}}, !!!recode.vec))
}
fix_question(sample_df, q1, c(A=NA_character_, B="Was B"))
q1 q2 1 <NA> B 2 Was B B 3 C A
Note that recode.vec
is "unquote-spliced" with !!!
. You can see what this is doing with this example, adapted from the Programming with dplyr vignette (search for "splice" to see the relevant examples). Note how !!!
"splices" the pairs of recoding values into the recode
function so that they are used as the ...
argument in recode
.
x = c("A", "B", "C")
args = c(A=NA_character_, B="Was B")
quo(recode(x, !!!args))
<quosure>
expr: ^recode(x, A = <chr: NA>, B = "Was B")
env: global
If you want to potentially run the recoding function on multiple columns, you can turn it into a function that takes just a column name and a recoding vector. This approach seems like it would be more pipe-friendly.
fix_question <- function(question, recode.vec) {
recode({{question}}, !!!recode.vec)
}
sample_df %>%
mutate_at(vars(matches("q")), list(~fix_question(., c(A=NA_character_, B="Was B"))))
q1 q2 1 <NA> Was B 2 Was B Was B 3 C <NA>
Or to recode a single column:
sample_df %>%
mutate(q1 = fix_question(q1, c(A=NA_character_, B="Was B")))
Here, on the right side of :=
, we can specify sym
to convert to symbol and then evaluate (!!
)
fix_question <- function(df, question) {
df %>%
mutate(!!question := recode(!! rlang::sym(question), "A" = NA_character_))
}
fix_question(sample_df, "q1")
# A tibble: 3 x 2
# q1 q2
# <chr> <chr>
#1 <NA> B
#2 B B
#3 C A
A better approach that would work for both quoted and unquoted input is ensym
fix_question <- function(df, question) {
question <- ensym(question)
df %>%
mutate(!!question := recode(!! question, "A" = NA_character_))
}
fix_question(sample_df, q1)
# A tibble: 3 x 2
# q1 q2
# <chr> <chr>
#1 <NA> B
#2 B B
#3 C A
fix_question(sample_df, "q1")
# A tibble: 3 x 2
# q1 q2
# <chr> <chr>
#1 <NA> B
#2 B B
#3 C A
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With