Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Use of tidyeval based non-standard evaluation in recode in right-hand side of mutate

Consider a tibble where each column is a character vector which can take many values -- let's say "A" through "F".

library(tidyverse)
sample_df <- tibble(q1 = c("A", "B", "C"), q2 = c("B", "B", "A"))

I wish to create a function which takes a column name as an argument, and recodes that column so that any answer "A" becomes an NA and the df is otherwise returned as is. The reason for designing it this way is to fit into a broader pipeline that performs a series of operations using a given column.

There are many ways to do this. But I am interested in understanding what the best idiomatic tidy_eval/tidyverse approach would be. First, the question name needs to be on the left hand side of a mutate verb, so we use the !! and := operators appropriately. But then, what to put on the right hand side?

fix_question <- function(df, question) {
    df %>% mutate(!!question := recode(... something goes here...))
}

fix_question(sample_df, "q1") # should produce a tibble whose first column is (NA, "B", "C")

My initial thought was that this would work:

df %>% mutate(!!question := recode(!!question, "A" = NA_character_))

But of course the bang-bang on inside the function just returns the literal character string (e.g. "q1"). I ended up taking what feels like a hacky route to reference the data on the right hand side, using the base R [[ operator and relying on the . construct from dplyr, and it works, so in a sense I have solved my underlying problem:

df %>% mutate(!!question := recode(.[[question]], "A" = NA_character_))

I'm interested in getting feedback from people who are very good at tidyeval as to whether there is a more idiomatic way to do this, in hopes that seeing a worked example would enhance my understanding of the tidyeval function set more generally. Any thoughts?

like image 878
aaron Avatar asked Oct 11 '19 17:10

aaron


3 Answers

You can use the "curly curly" method now if you have rlang >= 0.4.0.

Explanation thanks to @eipi10:

This combines the two step process of quote-then-unquote into one step, so {{question}} is equivalent to !!enquo(question)

fix_question <- function(df, question){
  df %>% mutate({{question}} := recode({{question}}, A = NA_character_))
}

fix_question(sample_df, q1)
# # A tibble: 3 x 2
#   q1    q2   
#   <chr> <chr>
# 1 NA    B    
# 2 B     B    
# 3 C     A    

Note that unlike the ensym approach, this doesn't work with character names. Even worse, it does the wrong thing instead of just giving an error.

fix_question(sample_df, 'q1')

# # A tibble: 3 x 2
#   q1    q2   
#   <chr> <chr>
# 1 q1    B    
# 2 q1    B    
# 3 q1    A    
like image 161
IceCreamToucan Avatar answered Nov 17 '22 20:11

IceCreamToucan


You can make the function a bit more flexible by allowing a vector of recoded values to be entered as an argument as well. For example:

library(tidyverse)
sample_df <- tibble(q1 = c("A", "B", "C"), q2 = c("B", "B", "A"))

fix_question <- function(df, question, recode.vec) {

  df %>% mutate({{question}} := recode({{question}}, !!!recode.vec))

}

fix_question(sample_df, q1, c(A=NA_character_, B="Was B"))
  q1    q2   
1 <NA>  B    
2 Was B B    
3 C     A

Note that recode.vec is "unquote-spliced" with !!!. You can see what this is doing with this example, adapted from the Programming with dplyr vignette (search for "splice" to see the relevant examples). Note how !!! "splices" the pairs of recoding values into the recode function so that they are used as the ... argument in recode.

x = c("A", "B", "C")
args = c(A=NA_character_, B="Was B")

quo(recode(x, !!!args))

<quosure>
expr: ^recode(x, A = <chr: NA>, B = "Was B")
env:  global

If you want to potentially run the recoding function on multiple columns, you can turn it into a function that takes just a column name and a recoding vector. This approach seems like it would be more pipe-friendly.

fix_question <- function(question, recode.vec) {

  recode({{question}}, !!!recode.vec)

}

sample_df %>% 
  mutate_at(vars(matches("q")), list(~fix_question(., c(A=NA_character_, B="Was B"))))
  q1    q2   
1 <NA>  Was B
2 Was B Was B
3 C     <NA>

Or to recode a single column:

sample_df %>% 
  mutate(q1 = fix_question(q1, c(A=NA_character_, B="Was B")))
like image 32
eipi10 Avatar answered Nov 17 '22 21:11

eipi10


Here, on the right side of :=, we can specify sym to convert to symbol and then evaluate (!!)

fix_question <- function(df, question) {
    df %>%
       mutate(!!question := recode(!! rlang::sym(question), "A" = NA_character_))
  }

fix_question(sample_df, "q1") 
# A tibble: 3 x 2
#  q1    q2   
#  <chr> <chr>
#1 <NA>  B    
#2 B     B    
#3 C     A    

A better approach that would work for both quoted and unquoted input is ensym

fix_question <- function(df, question) {
    question <- ensym(question)
    df %>%
       mutate(!!question := recode(!! question, "A" = NA_character_))
  }


fix_question(sample_df, q1)
# A tibble: 3 x 2
#  q1    q2   
#  <chr> <chr>
#1 <NA>  B    
#2 B     B    
#3 C     A    

fix_question(sample_df, "q1")
# A tibble: 3 x 2
#  q1    q2   
#  <chr> <chr>
#1 <NA>  B    
#2 B     B    
#3 C     A    
like image 6
akrun Avatar answered Nov 17 '22 22:11

akrun