select() in dplyr 0.7.5 returns a different result from dplyr 0.7.4 when using a named vector to specify columns.
library(dplyr)
df <- data.frame(a = 1:5, b = 6:10, c = 11:15)
print(df)
#> a b c
#> 1 1 6 11
#> 2 2 7 12
#> 3 3 8 13
#> 4 4 9 14
#> 5 5 10 15
# a named vector
cols <- c(x = 'a', y = 'b', z = 'c')
print(cols)
#> x y z
#> "a" "b" "c"
# with dplyr 0.7.4
# returns column names with vector values
select(df, cols)
#> a b c
#> 1 1 6 11
#> 2 2 7 12
#> 3 3 8 13
#> 4 4 9 14
#> 5 5 10 15
# with dplyr 0.7.5
# returns column names with vector names
select(df, cols)
#> x y z
#> 1 1 6 11
#> 2 2 7 12
#> 3 3 8 13
#> 4 4 9 14
#> 5 5 10 15
Is this a bug or a feature?
IMO it could have been considered a bug in 0.7.4, and is now fixed / more user-friendly.
With the move to tidyselect
, the logic has become a little more sophisticated.
If you compare dplyr::select_vars
to the new tidyselect::vars_select
(these are the variants used by dplyr:::select.data.frame
in 0.7.4 and 0.7.5 respectively), you can find that the line below was losing the names for the named & quoted (strings) case in 0.7.4:
ind_list <- map_if(ind_list, is_character, match_var, table = vars)
# example:
dplyr:::select.data.frame(mtcars, c(a = "mpg", b = "disp"))
Note that this is not an issue of named vectors in general, as the typical unquoted case was always fine:
dplyr:::select.data.frame(mtcars, c(a = mpg, b = disp))
# (here the names are indeed "a" and "b" afterwards)
There is a line of code that handles the usage of c()
:
ind_list <- map_if(ind_list, !is_helper, eval_tidy, data = names_list)
eval_tidy
is from the rlang
package, and in the line above would return the following for the problematic call:
[[1]]
a b
"mpg" "disp"
Now with tidyselect
, we have some extra handling, see https://github.com/tidyverse/tidyselect/blob/master/R/vars-select.R.
In particular, vars_select_eval
has the following line, where it is handling the usage of c()
:
ind_list <- map_if(quos, !is_helper, overscope_eval_next, overscope = overscope)
overscope_eval_next
is again from the rlang
package and calls the same routine as eval_tidy
would, but it receives an overscope variant of c()
that handles strings (through the overscope
argument). See tidyselect:::vars_c
.
So after this line, the c(a = "mpg", b = "disp")
case becomes the same as c(a = mpg, b = disp)
:
[[1]]
a b # these are the names
1 3 # these are the positions of the selected cols
is_character
then does not hold anymore in subsequent code, as opposed to above with rlang::eval_tidy
.
In case you look at these functions in rlang
, the fact that overscope_eval_next
is soft-deprecated in favor of eval_tidy
might confuse you given the above. But here I guess that tidyselect
just hasn't been "cleaned up" wrt this yet (naming inconsistencies etc would have to be addressed as well, so it's a re-write of more than just the one line with the call). But in the end eval_tidy
can be used in the same way now and probably will be.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With