I would like to extract the column name in the function call to mutate_if
. With this, I then want to look up a value in a
different table and fill in missing values with the lookup value. I tried using quosure
syntax, but it is not working.
Is there a possibility to extract the column name directly?
Sample Data
df <- structure(list(x = 1:10,
y = c(1L, 2L, 3L, NA, 1L, 2L, 3L, NA, 1L, 2L),
z = c(NA, 2L, 3L, NA, NA, 2L, 3L, NA, NA, 2L),
a = c("a", "b", "c", "d", "e", "a", "b", "c", "d", "e")),
.Names = c("x", "y", "z", "a"),
row.names = c(NA, -10L),
class = c("tbl_df", "tbl", "data.frame"))
df_lookup <- tibble(x = 0L, y = 5L, z = 8L)
Not working
It does not work to extract the name somehow directly.
df %>%
mutate_if(is.numeric, funs({
x <- .
x <- enquo(x)
lookup_value <- df_lookup %>% pull(quo_name(x))
x <- ifelse(is.na(x), lookup_value, x)
return(x)
}))
With an extra function I'm able to extract the name but then the replacement doesn't work anymore.
custom_mutate <- function(v) {
v <- enquo(v)
lookup_value <- df_lookup %>% pull(quo_name(v))
# ifelse(is.na((!!v)), lookup_value, (!!v))
}
df %>%
mutate_if(is.numeric, funs(custom_mutate(v = .)))
Works
If I add the df
as an additional argument to my custom function it works, but is there a way without this? It feels wrong and not how dplyr
is meant to be... Correct me if I'm wrong ;)
In addition to this I have to use UQE
instead of !!
and as it says in Programming with dplyr:
UQE() is for expert use only
custom_mutate2 <- function(v, df) {
v <- enquo(v)
lookup_value <- df_lookup %>% pull(quo_name(v))
df %>%
mutate(UQE(v) := ifelse(is.na((!!v)), lookup_value, (!!v))) %>%
pull(!!v)
}
df %>%
mutate_if(is.numeric, funs(custom_mutate2(v = ., df = df)))
Expected output
# A tibble: 10 x 4
# x y z a
# <int> <int> <int> <chr>
# 1 1 1 8 a
# 2 2 2 2 b
# 3 3 3 3 c
# 4 4 5 8 d
# 5 5 1 8 e
# 6 6 2 2 a
# 7 7 3 3 b
# 8 8 5 8 c
# 9 9 1 8 d
# 10 10 2 2 e
You have to use quo
instead of enquo
#enquo(.) :
<quosure: empty>
~function (expr)
{
enexpr(expr)
}
...
#quo(.) :
<quosure: frame>
~x
<quosure: frame>
~y
<quosure: frame>
~z
With your example :
mutate_if(df, is.numeric, funs({
lookup_value <- df_lookup %>% pull(quo_name(quo(.)))
ifelse(is.na(.), lookup_value, .)
}))
# A tibble: 10 x 4
x y z a
<int> <int> <int> <chr>
1 1 1 8 a
2 2 2 2 b
3 3 3 3 c
4 4 5 8 d
5 5 1 8 e
6 6 2 2 a
7 7 3 3 b
8 8 5 8 c
9 9 1 8 d
10 10 2 2 e
Julien Nvarre's answer is absolutely correct (you need to use quo
) but, since my first thought would also have been to use enquo
I have looked at why you have to use quo
instead:
If we look at the source for mutate_if
we can see how it is constructed:
dplyr:::mutate_if
#> function (.tbl, .predicate, .funs, ...)
#> {
#> funs <- manip_if(.tbl, .predicate, .funs, enquo(.funs), caller_env(),
#> ...)
#> mutate(.tbl, !(!(!funs)))
#> }
#> <environment: namespace:dplyr>
By overriding the mutate_if
function in dplyr
with a slight modification, I can insert a call to print()
allowing me to look at the funs
object being passed to mutate
:
mutate_if <- function (.tbl, .predicate, .funs, ...)
{
funs <- dplyr:::manip_if(.tbl, .predicate, .funs, enquo(.funs), caller_env(),
...)
print(funs)
}
Then, running your code will use this modified mutate_if
function::
df <- structure(list(x = 1:10,
y = c(1L, 2L, 3L, NA, 1L, 2L, 3L, NA, 1L, 2L),
z = c(NA, 2L, 3L, NA, NA, 2L, 3L, NA, NA, 2L),
a = c("a", "b", "c", "d", "e", "a", "b", "c", "d", "e")),
.Names = c("x", "y", "z", "a"),
row.names = c(NA, -10L),
class = c("tbl_df", "tbl", "data.frame"))
df_lookup <- tibble(x = 0L, y = 5L, z = 8L)
df %>%
mutate_if(is.numeric, funs({
x <- .
x <- enquo(x)
lookup_value <- df_lookup %>% pull(quo_name(x))
x <- ifelse(is.na(x), lookup_value, x)
return(x)
}))
#> $x
#> <quosure>
#> expr: ^{
#> x <- x
#> x <- enquo(x)
#> lookup_value <- df_lookup %>% pull(quo_name(x))
#> x <- ifelse(is.na(x), lookup_value, x)
#> return(x)
#> }
#> env: 0000000007FBBFA0
#>
#> $y
#> <quosure>
#> expr: ^{
#> x <- y
#> x <- enquo(x)
#> lookup_value <- df_lookup %>% pull(quo_name(x))
#> x <- ifelse(is.na(x), lookup_value, x)
#> return(x)
#> }
#> env: 0000000007FBBFA0
#>
#> $z
#> <quosure>
#> expr: ^{
#> x <- z
#> x <- enquo(x)
#> lookup_value <- df_lookup %>% pull(quo_name(x))
#> x <- ifelse(is.na(x), lookup_value, x)
#> return(x)
#> }
#> env: 0000000007FBBFA0
Now, we can see that the function list being passed to the mutate call has already substituted the name of the column for the .
variable. This means that, within the statement, there is a variable called x
, y
, or z
the value of which comes from df
.
Imagine the simple case, we have:
library(rlang)
x <- 1:10
quo(x)
#> <quosure>
#> expr: ^x
#> env: 0000000007615318
enquo(x)
#> <quosure>
#> expr: ^<int: 1L, 2L, 3L, 4L, 5L, ...>
#> env: empty
From this, hopefully you can extrapolate why you want to use quo
rather than enquo
. You are after the column name, which is the name of the variable - given to you by quo
.
Thus, using quo
instead of enquo
and not assigning it to a variable first:
mutate_if(df, is.numeric, funs({
lookup_value <- df_lookup %>% pull(quo_name(quo(.)))
ifelse(is.na(.), lookup_value, .)
}))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With