Suppose we have a character vector cols_to_select
containing some columns we want to select from a dataframe df
, e.g.
df <- tibble::data_frame(a=1:3, b=1:3, c=1:3, d=1:3, e=1:3)
cols_to_select <- c("b", "d")
Suppose also we want to use dplyr::select
because it's part of an operation that uses %>%
so using select
makes the code easy to read.
There seem to be a number of ways which this can be achieved, but some are more robust than others. Please could you let me know which is the 'correct' version and why? Or perhaps there is another, better way?
dplyr::select(df, cols_to_select) #Fails if 'cols_to_select' happens to be the name of a column in df
dplyr::select(df, !!cols_to_select) # i.e. using UQ()
dplyr::select(df, !!!cols_to_select) # i.e. using UQS()
cols_to_select_syms <- rlang::syms(c("b", "d")) #See [here](https://stackoverflow.com/questions/44656993/how-to-pass-a-named-vector-to-dplyrselect-using-quosures/44657171#44657171)
dplyr::select(df, !!!cols_to_select_syms)
p.s. I realise this can be achieved in base R using simply df[,cols_to_select]
There is an example with dplyr::select
in https://cran.r-project.org/web/packages/rlang/vignettes/tidy-evaluation.html that uses:
dplyr::select(df, !!cols_to_select)
Why? Let's explore the options you mention:
dplyr::select(df, cols_to_select)
As you say this fails if cols_to_select
happens to be the name of a column in df, so this is wrong.
cols_to_select_syms <- rlang::syms(c("b", "d"))
dplyr::select(df, !!!cols_to_select_syms)
This looks more convoluted than the other solutions.
dplyr::select(df, !!cols_to_select)
dplyr::select(df, !!!cols_to_select)
These two solutions provide the same results in this case. You can see the output of !!cols_to_select
and !!!cols_to_select
by doing:
dput(rlang::`!!`(cols_to_select)) # c("b", "d")
dput(rlang::`!!!`(cols_to_select)) # pairlist("b", "d")
The !!
or UQ()
operator evaluates its argument immediately in its context, and that is what you want.
The !!!
or UQS()
operator are used to pass multiple arguments at once to a function.
For character column names like in your example it does not matter if you give them as a single vector of length 2 (using !!
) or as a list with two vectors of length one (using !!!
). For more complex use cases you will need to use multiple arguments as a list: (using !!!
)
a <- quos(contains("c"), dplyr::starts_with("b"))
dplyr::select(df, !!a) # does not work
dplyr::select(df, !!!a) # does work
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With