I have 18 pairs of variable and I would like to do pair-wise math on them to calculate 18 new variables. The across() function in dplyr is quite handy when applying a formula to one column. Is there a way to apply across() to pairs of columns? Tiny example with simple division of 2 variables (my actual code will be more complex, some ifelse, ...): <pre class="prettyprint"><code>library(tidyverse) library(glue) # filler data df <- data.frame("label" = c('a','b','c','d'), "A" = c(4, 3, 8, 9), "B" = c(10, 0, 4, 1), "error_A" = c(0.4, 0.3, 0.2, 0.1), "error_B" = c(0.3, 0, 0.4, 0.1)) # what I want to have in the end # instead of just 2 (A, B), I have 18 df1 <- df %>% mutate( 'R_A' = A/error_A, 'R_B' = B/error_B ) # what I'm thinking about doing to use both variables A and error_A to calculate the new column df2 <- df %>% mutate( across(c('A','B'), ~.x/{HOW DO I USE THE COLUMN WHOSE NAME IS glue('error_',.x)} .names = 'R_{.col}' ) </code></pre>

One option is <code>map/reduce</code>. Specify the columns of interest ('nm1'), loop over them in <code>map</code>, <code>select</code> those columns from the dataset, <code>reduce</code> by dividing, <code>rename</code> the columns after column binding (<code>_dfc</code>), and bind those with the original dataset <pre class="prettyprint"><code>library(dplyr) library(purrr) library(stringr) nm1 <- c('A', 'B') map_dfc(nm1, ~ df %>% select(ends_with(.x)) %>% reduce(., `/`) ) %>% rename_all(~ str_c('R_', nm1)) %>% bind_cols(df, .) </code></pre> -output <pre class="prettyprint"><code># label A B error_A error_B R_A R_B #1 a 4 10 0.4 0.3 10 33.33333 #2 b 3 0 0.3 0.0 10 NaN #3 c 8 4 0.2 0.4 40 10.00000 #4 d 9 1 0.1 0.1 90 10.00000 </code></pre> <hr> Or another option with <code>across</code> <pre class="prettyprint"><code>df %>% mutate(across(c(A, B), ~ ./get(str_c('error_', cur_column() )), .names = 'R_{.col}' )) # label A B error_A error_B R_A R_B #1 a 4 10 0.4 0.3 10 33.33333 #2 b 3 0 0.3 0.0 10 NaN #3 c 8 4 0.2 0.4 40 10.00000 #4 d 9 1 0.1 0.1 90 10.00000 </code></pre>

I like akruns answer above, especially the approach with <code>cur_column()</code>. Interestingly, <code>cur_column()</code> can't be used with {rlang}'s evaluation (<code>!! sym(paste0("error_", cur_column()))</code>), but <code>get</code> is a nice workaround. Just to add one more approach, which also works under dpylr < 1.0.0. I usually use a <code>mutate</code> custom function together with <code>purrr::reduce()</code>. In this function <code>x</code> is your string stem and you construct all variables you want to access with <code>!! sym(paste0(...))</code>. On the left hand-side you can just use {rlang}'s glue syntax. You apply this custom function by calling <code>reduce()</code> on the vector of strings and your <code>data.frame</code> goes in the <code>.init = .</code> argument. <pre class="prettyprint lang-r prettyprint-override"><code>library(tidyverse) library(glue) # filler data df <- data.frame("label" = c('a','b','c','d'), "A" = c(4, 3, 8, 9), "B" = c(10, 0, 4, 1), "error_A" = c(0.4, 0.3, 0.2, 0.1), "error_B" = c(0.3, 0, 0.4, 0.1)) gen_vars1 <- function(df, x) { mutate(df, "R_{x}" := !! sym(x) / !! sym(paste0("error_", x))) } df %>% reduce(c("A", "B"), gen_vars1, .init = .) #> label A B error_A error_B R_A R_B #> 1 a 4 10 0.4 0.3 10 33.33333 #> 2 b 3 0 0.3 0.0 10 NaN #> 3 c 8 4 0.2 0.4 40 10.00000 #> 4 d 9 1 0.1 0.1 90 10.00000 </code></pre> Created on 2021-01-02 by the reprex package (v0.3.0) I once opened a feature request for this kind of problem, but apparently it is too special case for {dplyr}. When you follow the link you can also find another option to do this kind of operation.

Can you use dplyr across() to iterate across pairs of columns?

Tags:

r

dplyr

tidyverse

across

I have 18 pairs of variable and I would like to do pair-wise math on them to calculate 18 new variables. The across() function in dplyr is quite handy when applying a formula to one column. Is there a way to apply across() to pairs of columns?

Tiny example with simple division of 2 variables (my actual code will be more complex, some ifelse, ...):

library(tidyverse)
library(glue)

# filler data
df <- data.frame("label" = c('a','b','c','d'),
                 "A" = c(4, 3, 8, 9),
                 "B" = c(10, 0, 4, 1),
                 "error_A" = c(0.4, 0.3, 0.2, 0.1),
                 "error_B" = c(0.3, 0, 0.4, 0.1))

# what I want to have in the end 
# instead of just 2 (A, B), I have 18
df1 <- df %>% mutate(
  'R_A' = A/error_A,
  'R_B' = B/error_B
)

# what I'm thinking about doing to use both variables A and error_A to calculate the new column
df2 <- df %>% mutate(
  across(c('A','B'),
         ~.x/{HOW DO I USE THE COLUMN WHOSE NAME IS glue('error_',.x)}
         .names = 'R_{.col}'
)

401

asked Jan 02 '21 20:01

nefosl

4 Answers

One option is map/reduce. Specify the columns of interest ('nm1'), loop over them in map, select those columns from the dataset, reduce by dividing, rename the columns after column binding (_dfc), and bind those with the original dataset

library(dplyr)
library(purrr)
library(stringr)
nm1 <- c('A', 'B')
map_dfc(nm1, ~ df %>% 
                select(ends_with(.x)) %>% 
                reduce(., `/`) ) %>%
    rename_all(~ str_c('R_', nm1)) %>%
    bind_cols(df, .)

-output

#  label A  B error_A error_B R_A      R_B
#1     a 4 10     0.4     0.3  10 33.33333
#2     b 3  0     0.3     0.0  10      NaN
#3     c 8  4     0.2     0.4  40 10.00000
#4     d 9  1     0.1     0.1  90 10.00000

Or another option with across

df %>% 
    mutate(across(c(A, B), ~ 
     ./get(str_c('error_', cur_column() )), .names = 'R_{.col}' ))
#  label A  B error_A error_B R_A      R_B
#1     a 4 10     0.4     0.3  10 33.33333
#2     b 3  0     0.3     0.0  10      NaN
#3     c 8  4     0.2     0.4  40 10.00000
#4     d 9  1     0.1     0.1  90 10.00000

107

answered Oct 17 '22 08:10

akrun

One option could be:

df %>%
 mutate(across(c(A, B), .names = "R_{col}")/across(starts_with("error")))

  label A  B error_A error_B R_A      R_B
1     a 4 10     0.4     0.3  10 33.33333
2     b 3  0     0.3     0.0  10      NaN
3     c 8  4     0.2     0.4  40 10.00000
4     d 9  1     0.1     0.1  90 10.00000

answered Oct 17 '22 07:10

tmfmnk

I like akruns answer above, especially the approach with cur_column(). Interestingly, cur_column() can't be used with {rlang}'s evaluation (!! sym(paste0("error_", cur_column()))), but get is a nice workaround.

Just to add one more approach, which also works under dpylr < 1.0.0. I usually use a mutate custom function together with purrr::reduce(). In this function x is your string stem and you construct all variables you want to access with !! sym(paste0(...)). On the left hand-side you can just use {rlang}'s glue syntax.

You apply this custom function by calling reduce() on the vector of strings and your data.frame goes in the .init = . argument.

library(tidyverse)
library(glue)


# filler data
df <- data.frame("label" = c('a','b','c','d'),
                 "A" = c(4, 3, 8, 9),
                 "B" = c(10, 0, 4, 1),
                 "error_A" = c(0.4, 0.3, 0.2, 0.1),
                 "error_B" = c(0.3, 0, 0.4, 0.1))

gen_vars1 <- function(df, x) {
  
  mutate(df,
         "R_{x}" := !! sym(x) / !! sym(paste0("error_", x)))
}

df %>% 
  reduce(c("A", "B"), gen_vars1, .init = .)
#>   label A  B error_A error_B R_A      R_B
#> 1     a 4 10     0.4     0.3  10 33.33333
#> 2     b 3  0     0.3     0.0  10      NaN
#> 3     c 8  4     0.2     0.4  40 10.00000
#> 4     d 9  1     0.1     0.1  90 10.00000

^{Created on 2021-01-02 by the reprex package (v0.3.0)}

I once opened a feature request for this kind of problem, but apparently it is too special case for {dplyr}. When you follow the link you can also find another option to do this kind of operation.

answered Oct 17 '22 08:10

TimTeaFan

For such cases I find base R solution straight-forward and efficient too. It doesn't require to loop over columns or unique values. You define two groups of column and divide them directly.

For the example that you have shared we can identify "A" and "B" columns by looking for columns names that have only one character in them.

cols <- grep('^.$', names(df), value = TRUE)
error_cols <- grep('error', names(df), value = TRUE)

df[paste0('R_', cols)] <- df[cols]/df[error_cols]
df

#  label A  B error_A error_B R_A  R_B
#1     a 4 10     0.4     0.3  10 33.3
#2     b 3  0     0.3     0.0  10  NaN
#3     c 8  4     0.2     0.4  40 10.0
#4     d 9  1     0.1     0.1  90 10.0

answered Oct 17 '22 09:10

Ronak Shah

Related questions
                            
                                R: fastest way to check presence of each element of a vector in each of the columns of a matrix
                            
                                Export all user inputs in a Shiny app to file and load them later
                            
                                matrix by matrix in R markdown
                            
                                R: How can I export methods provided by a package to a PSOCK cluster?
                            
                                Using ggplot to plot occupied parking spaces in a parking lot
                            
                                Progressive concatenation of a column by a group [duplicate]
                            
                                Create list of vectors using rep()
                            
                                Finding the max of a R dataframe column ignoring -Inf and NA
                            
                                Displaying separate means within fill groups in ggplot boxplot
                            
                                Combine plots that have a legend with one that doesn't
                            
                                How to render flexdashboard from the command line?
                            
                                Line height spacing for text in ggplot
                            
                                R function to convert object into code
                            
                                Simplify this grid such that each row and column has 1 value
                            
                                Make Shiny icon smaller in Valuebox
                            
                                DT::datatable – Format selected column?
                            
                                RMarkdown - Change Inline Code Color
                            
                                Count consecutive elements in a same length vector
                            
                                How to cope with a singular fit in a linear mixed model (lme4)?
                            
                                I am trying to install openssl package in R using Ubuntu 18.04 without success

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With