Is it possible to refer to column names in a lambda function inside across()
?
df <- tibble(age = c(12, 45), sex = c('f', 'f'))
allowed_values <- list(age = 18:100, sex = c("f", "m"))
df %>%
mutate(across(c(age, sex),
c(valid = ~ .x %in% allowed_values[[COLNAME]])))
I just came across this question where OP asks about validating columns in a dataframe based on a list of allowed values.
dplyr
just gained across()
and it seems like a natural choice, but we need columns names to look up the allowed values.
The best I could come up with was a call to imap_dfr
, but it is more cumbersome to integrate into an anlysis pipeline, because the results need to be re-combined with the original dataframe.
across() returns a tibble with one column for each column in .
To get multiple columns of matrix, specify the column numbers as a vector preceded by a comma, in square brackets, after the matrix variable name. This expression returns the required columns as a matrix.
%>% is called the forward pipe operator in R. It provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. It is defined by the package magrittr (CRAN) and is heavily used by dplyr (CRAN).
The answer is yes, you can refer to column names in dplyr
's across
. You need to use cur_column()
. Your original answer was so close! Insert cur_column()
into your solution where you want the column name:
library(tidyverse)
df <- tibble(age = c(12, 45), sex = c('f', 'f'))
allowed_values <- list(age = 18:100, sex = c("f", "m"))
df %>%
mutate(across(c(age, sex),
c(valid = ~ .x %in% allowed_values[[cur_column()]])
)
)
Reference: https://dplyr.tidyverse.org/articles/colwise.html#current-column
I think that you may be asking too much of across
at this point (but this may spur additional development, so maybe someday it will work the way you suggest).
I think that the imap
functions from the purrr package may give you what you want at this point:
> df <- tibble(age = c(12, 45), sex = c('f', 'f'))
> allowed_values <- list(age = 18:100, sex = c("f", "m"))
>
> df %>% imap( ~ .x %in% allowed_values[[.y]])
$age
[1] FALSE TRUE
$sex
[1] TRUE TRUE
> df %>% imap_dfc( ~ .x %in% allowed_values[[.y]])
# A tibble: 2 x 2
age sex
<lgl> <lgl>
1 FALSE TRUE
2 TRUE TRUE
If you want a single column with the combined validity then you can pass the result through reduce
:
> df %>% imap( ~ .x %in% allowed_values[[.y]]) %>%
+ reduce(`&`)
[1] FALSE TRUE
This could then be added as a new column to the original data, or just used for subsetting the data. I am not expert enough with the tidyverse yet to know if this could be combined with mutate
to add the columns directly.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With