Refering to column names inside dplyr's across()

Tags:

Is it possible to refer to column names in a lambda function inside across()?

df <- tibble(age = c(12, 45), sex = c('f', 'f'))
allowed_values <- list(age = 18:100, sex = c("f", "m"))

df %>%
  mutate(across(c(age, sex),
                c(valid = ~ .x %in% allowed_values[[COLNAME]])))

I just came across this question where OP asks about validating columns in a dataframe based on a list of allowed values.

dplyr just gained across() and it seems like a natural choice, but we need columns names to look up the allowed values.

The best I could come up with was a call to imap_dfr, but it is more cumbersome to integrate into an anlysis pipeline, because the results need to be re-combined with the original dataframe.

258

asked Jun 02 '20 15:06

severin

2 Answers

The answer is yes, you can refer to column names in dplyr's across. You need to use cur_column(). Your original answer was so close! Insert cur_column() into your solution where you want the column name:

library(tidyverse)

df <- tibble(age = c(12, 45), sex = c('f', 'f'))
allowed_values <- list(age = 18:100, sex = c("f", "m"))

df %>%
  mutate(across(c(age, sex),
                c(valid = ~ .x %in% allowed_values[[cur_column()]])
                )
         )

Reference: https://dplyr.tidyverse.org/articles/colwise.html#current-column

answered Oct 18 '22 18:10

s_pike

I think that you may be asking too much of across at this point (but this may spur additional development, so maybe someday it will work the way you suggest).

I think that the imap functions from the purrr package may give you what you want at this point:

> df <- tibble(age = c(12, 45), sex = c('f', 'f'))
> allowed_values <- list(age = 18:100, sex = c("f", "m"))
> 
> df %>% imap( ~ .x %in% allowed_values[[.y]])
$age
[1] FALSE  TRUE

$sex
[1] TRUE TRUE

> df %>% imap_dfc( ~ .x %in% allowed_values[[.y]])
# A tibble: 2 x 2
  age   sex  
  <lgl> <lgl>
1 FALSE TRUE 
2 TRUE  TRUE

If you want a single column with the combined validity then you can pass the result through reduce:

> df %>% imap( ~ .x %in% allowed_values[[.y]]) %>%
+   reduce(`&`)
[1] FALSE  TRUE

This could then be added as a new column to the original data, or just used for subsetting the data. I am not expert enough with the tidyverse yet to know if this could be combined with mutate to add the columns directly.

answered Oct 18 '22 19:10

Greg Snow

Related questions
                            
                                Converting a Number Matrix to a Color Matrix in R
                            
                                r/ggplot - Use position_jitterdodge without a fill aesthetic
                            
                                Find the source file containing R function definition
                            
                                What is the difference between [ ] and [[ ]] in R? [duplicate]
                            
                                Counting consecutive patterns in strings using R
                            
                                R - how to allocate screen space to complex ggplot images
                            
                                Rblpapi - using bdp with ISIN / Cusip gives error
                            
                                Suppress any emission of a particular warning message
                            
                                R Shiny modules with conditionalPanel and reactives
                            
                                removing offset terms from a formula
                            
                                ggplot2 - multiple plots scaling
                            
                                List tables within a Postgres schema using R
                            
                                Error: Cannot pass NA to dbQuoteIdentifier() in sqldf package in R
                            
                                How to join tables from different SQL databases using R and dplyr?
                            
                                How to create a namespace and export a function into it?
                            
                                Random Effects in Longitudinal Multilevel Imputation Models Using MICE
                            
                                Find nearest features using sf in R
                            
                                Adjusting sankey plot in tabbed section
                            
                                Explicitly set panel size (not just plot size) in ggplot2
                            
                                Leaflet cluster marker spacing, or how to have smaller groups when zoomed out

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Refering to column names inside dplyr's across()

Tags:

r

dplyr

tidyverse

severin

People also ask

2 Answers

s_pike

Greg Snow

Recent Activity

Donate For Us