I noticed an unexpected behavior of mutate_at
. Suppose I have a data frame and a list of columns I want to mutate, like:
df1 <- data_frame(var1 = c(1,2,3,4,5,6),
var2 = c(1,1,1,2,2,2),
var3 = c(10,30,50,70,90,110))
variables <- c("var1", "var2")
I now apply mutate_at
to create new factor versions of the columns defined in variables
. By specifying "cat" in list
, I am making sure the old versions are kept, and the new versions have the name of the old version plus "_cat":
df1 %>% mutate_at(vars(variables), .funs = list(cat = as.factor))
# A tibble: 6 x 5
var1 var2 var3 var1_cat var2_cat
<dbl> <dbl> <dbl> <fct> <fct>
1 1 1 10 1 1
2 2 1 30 2 1
3 3 1 50 3 1
4 4 2 70 4 2
5 5 2 90 5 2
6 6 2 110 6 2
However, if I apply mutate_at
to only one column (in my case, my variables
vector has only one element), the name of the new variable is only "cat":
variables <- c("var1")
df1 %>% mutate_at(vars(variables), .funs = list(cat = as.factor))
# A tibble: 6 x 4
var1 var2 var3 cat
<dbl> <dbl> <dbl> <fct>
1 1 1 10 1
2 2 1 30 2
3 3 1 50 3
4 4 2 70 4
5 5 2 90 5
6 6 2 110 6
On some level, I understand why mutate_at
is doing this: If you want to name one mutated column in any special way, just use mutate
like mutate(var1_cat = as.factor(var1))
.
However, in my case, I want to run the mutate_at
operation over a number of data frames, for each of which I have a vector of columns to change. Crucially, these vectors might have only one element. So, would it not be better for mutate_at
to show the same naming behavior no matter how many vars
it receives?
I don't think this is the expected behavior (or at least shouldn't be), and the good news is that the newest version of dplyr gets rid of this behavior. Currently you can install it using remotes::install_github('tidyverse/dplyr')
, but should be on CRAN in the coming month or 2.
mutate_at
(and other scoped verbs like mutate_if
, summarize_all
, etc.) has been replaced by the use of across
within existing verbs, and this provides the behavior you are looking for.
library(dplyr)
variables <- c("var1", "var2")
df1 %>%
mutate(across(all_of(variables), .fns = list(cat = as.factor)))
#> # A tibble: 6 x 5
#> var1 var2 var3 var1_cat var2_cat
#> <dbl> <dbl> <dbl> <fct> <fct>
#> 1 1 1 10 1 1
#> 2 2 1 30 2 1
#> 3 3 1 50 3 1
#> 4 4 2 70 4 2
#> 5 5 2 90 5 2
#> 6 6 2 110 6 2
variables <- c("var1")
df1 %>%
mutate(across(all_of(variables), .fns = list(cat = as.factor)))
#> # A tibble: 6 x 4
#> var1 var2 var3 var1_cat
#> <dbl> <dbl> <dbl> <fct>
#> 1 1 1 10 1
#> 2 2 1 30 2
#> 3 3 1 50 3
#> 4 4 2 70 4
#> 5 5 2 90 5
#> 6 6 2 110 6
Session info
sessionInfo()
#> R version 3.6.3 (2020-02-29)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 17763)
#>
#> Matrix products: default
#>
#> locale:
#> [1] LC_COLLATE=English_United Kingdom.1252
#> [2] LC_CTYPE=English_United Kingdom.1252
#> [3] LC_MONETARY=English_United Kingdom.1252
#> [4] LC_NUMERIC=C
#> [5] LC_TIME=English_United Kingdom.1252
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] dplyr_0.8.99.9001
#>
#> loaded via a namespace (and not attached):
#> [1] Rcpp_1.0.3 knitr_1.28 magrittr_1.5 tidyselect_1.0.0
#> [5] R6_2.4.1 rlang_0.4.5.9000 fansi_0.4.1 stringr_1.4.0
#> [9] highr_0.8 tools_3.6.3 xfun_0.12 utf8_1.1.4
#> [13] cli_2.0.2 htmltools_0.4.0 ellipsis_0.3.0 assertthat_0.2.1
#> [17] yaml_2.2.1 digest_0.6.25 tibble_2.1.3 lifecycle_0.2.0
#> [21] crayon_1.3.4 purrr_0.3.3 vctrs_0.2.99.9010 glue_1.3.2
#> [25] evaluate_0.14 rmarkdown_2.1 stringi_1.4.6 compiler_3.6.3
#> [29] pillar_1.4.3 pkgconfig_2.0.3
Not sure if there is an easy solution to this.
However, one way would be to apply the function based on length
of variables
.
library(dplyr)
if (length(variables) > 1) {
df1 %>% mutate_at(vars(variables), list(cat = as.factor))
} else {
df1 %>% mutate(!!paste0(variables, "_cat") := as.factor(!!sym(variables)))
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With