I would like to correct a column in a data.frame by subtracting from it another column with nearly identical name, but this other column has a suffix. I would like to use the mutate_at
function for this.
Trying to figure this out, I have struggled to access the name of a column in the function part of mutate_at, to the use it to access the other column.
I show this in a small example below, but basically I would like to access the name of the column used at the moment .
and then select from the data in the pipe a column that has the same name as .
but with a suffix (below that would be "_new"
).
Thanks for your help!
Here is an example of how I would have liked to do it - but this does not work.
library(tidyverse)
data("mtcars")
new <- mtcars/4
names(new) <-paste0(names(new),"_new")
df <- bind_cols(mtcars,new)
df %>%
mutate_at(.vars = vars(carb,disp),
.funs = list(corrected = ~ . - df %>% pull(paste0(names(.),"_new"))))
df %>% pull(paste0("carb","_new"))
To select a column in R you can use brackets e.g., YourDataFrame['Column'] will take the column named “Column”. Furthermore, we can also use dplyr and the select() function to get columns by name or index. For instance, select(YourDataFrame, c('A', 'B') will take the columns named “A” and “B” from the dataframe.
6.4 dplyr basics filter() : pick observations by their values. select() : pick variables by their names. mutate() : create new variables with functions of existing variables. summarise() : collapse many values down to a single summary.
Instead of using mutate_at
why not use mutate
combined with across
and cur_column
i.e:
df %>%
mutate( across( c(carb,disp), ~ . - pull(df, paste0(cur_column(), "_new") ), .names = "{.col}_corrected") )
As the others already pointed out, variable names cannot be accessed in mutate_at
and this holds also true for the coming mutate(across())
. I addressed this issue as feature request for dplyr
here, but apparently, this kind of data-wrangling task is too specialized for dplyr
. Below I provide my favorite work-around for this kind of data-wrangling problems, which consists of two steps:
!! rlang::sym()
to generate variable based on a character vector of variable namespurrr::reduce
.library(tidyverse)
# your toy data
df <- mtcars %>%
as_tibble %>%
mutate_all(list(new =~ ./4))
# step 1: generate helper function, in this case a simple `mutate` call
gen_corrected <- function(df, x) {
mutate(df,
"{x}_corrected" := !! rlang::sym(x) - !! rlang::sym(str_c(x, "_new"))
)
}
# step 2:
# use purrr's `reduce` on the vector of vars you want to change
# the vector of variables can be defined in a separate step
# important: you need to set `.init = .`
df %>%
purrr::reduce(c('carb', 'disp'), gen_corrected, .init = .)
#> # A tibble: 32 x 24
#> mpg cyl disp hp drat wt qsec vs am gear carb mpg_new
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4 5.25
#> 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4 5.25
#> 3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1 5.7
#> 4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1 5.35
#> 5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2 4.68
#> 6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1 4.53
#> 7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4 3.58
#> 8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2 6.1
#> 9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2 5.7
#> 10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4 4.8
#> # … with 22 more rows, and 12 more variables: cyl_new <dbl>, disp_new <dbl>,
#> # hp_new <dbl>, drat_new <dbl>, wt_new <dbl>, qsec_new <dbl>, vs_new <dbl>,
#> # am_new <dbl>, gear_new <dbl>, carb_new <dbl>, carb_corrected <dbl>,
#> # disp_corrected <dbl>
Created on 2020-05-21 by the reprex package (v0.3.0)
In the github issue mention above @Romain Francois provides another work-around to this issue.
We can't access the names
of the .
inside mutate_at
because it is a vector
and doesn't have the column name info. One option is map2
library(purrr)
library(dplyr)
library(stringr)
nm1 <- c('carb', 'disp')
map_dfc(nm1, ~ df %>%
transmute(!!str_c(.x, '_corrected') :=
!! rlang::sym(.x) - !! rlang::sym(str_c(.x, "_new"))))%>%
bind_cols(df, .) %>%
head
# mpg cyl disp hp drat wt qsec vs am gear carb mpg_new cyl_new disp_new hp_new drat_new wt_new qsec_new
#1 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 5.250 1.5 40.00 27.50 0.9750 0.65500 4.1150
#2 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 5.250 1.5 40.00 27.50 0.9750 0.71875 4.2550
#3 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 5.700 1.0 27.00 23.25 0.9625 0.58000 4.6525
#4 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 5.350 1.5 64.50 27.50 0.7700 0.80375 4.8600
#5 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 4.675 2.0 90.00 43.75 0.7875 0.86000 4.2550
#6 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 4.525 1.5 56.25 26.25 0.6900 0.86500 5.0550
# vs_new am_new gear_new carb_new carb_corrected disp_corrected
#1 0.00 0.25 1.00 1.00 3.00 120.00
#2 0.00 0.25 1.00 1.00 3.00 120.00
#3 0.25 0.25 1.00 0.25 0.75 81.00
#4 0.25 0.00 0.75 0.25 0.75 193.50
#5 0.00 0.00 0.75 0.50 1.50 270.00
#6 0.25 0.00 0.75 0.25 0.75 168.75
Or we can update with a for
loop
for(nm in nm1) {
df <- df %>%
mutate(!! str_c(nm, '_corrected') :=
!! rlang::sym(nm) - !! rlang::sym(str_c(nm, '_new')))
}
Or using base R
with lapply
df[paste0(nm1, "_corrected")] <- lapply(nm1, function(nm)
df[nm] - df[paste0(nm, "_new")])
You can use Map
in base R or map2
from purrr
for this :
cols <- c('carb', 'disp')
df[paste0(cols, '_corrected')] <- Map(`-`, df[cols], df[paste0(cols, '_new')])
Using map2
library(purrr)
df[paste0(cols, '_corrected')] <- map2(df[cols], df[paste0(cols, '_new')], `-`)
Here is a base R solution. We can define a function, diff_col
, that creates the operation and then use a for loop to repeatedly apply this function to modify the data frame.
diff_col <- function(col, dat){
dat[[paste0(col, "_corrected")]] <- dat[[col]] - dat[[paste0(col, "_new")]]
return(dat)
}
for (name in c("carb", "disp")){
df <- diff_col(col = name, dat = df)
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With