Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Correct usage of dplyr::select in dplyr 0.7.0+, selecting columns using character vector

Suppose we have a character vector cols_to_select containing some columns we want to select from a dataframe df, e.g.

df <- tibble::data_frame(a=1:3, b=1:3, c=1:3, d=1:3, e=1:3)
cols_to_select <- c("b", "d")

Suppose also we want to use dplyr::select because it's part of an operation that uses %>% so using select makes the code easy to read.

There seem to be a number of ways which this can be achieved, but some are more robust than others. Please could you let me know which is the 'correct' version and why? Or perhaps there is another, better way?

dplyr::select(df, cols_to_select) #Fails if 'cols_to_select' happens to be the name of a column in df 
dplyr::select(df, !!cols_to_select) # i.e. using UQ()
dplyr::select(df, !!!cols_to_select) # i.e. using UQS()

cols_to_select_syms <- rlang::syms(c("b", "d"))  #See [here](https://stackoverflow.com/questions/44656993/how-to-pass-a-named-vector-to-dplyrselect-using-quosures/44657171#44657171)
dplyr::select(df, !!!cols_to_select_syms)

p.s. I realise this can be achieved in base R using simply df[,cols_to_select]

like image 365
RobinL Avatar asked Jun 24 '17 18:06

RobinL


1 Answers

There is an example with dplyr::select in https://cran.r-project.org/web/packages/rlang/vignettes/tidy-evaluation.html that uses:

dplyr::select(df, !!cols_to_select)

Why? Let's explore the options you mention:

Option 1

dplyr::select(df, cols_to_select)

As you say this fails if cols_to_select happens to be the name of a column in df, so this is wrong.

Option 4

cols_to_select_syms <- rlang::syms(c("b", "d"))  
dplyr::select(df, !!!cols_to_select_syms)

This looks more convoluted than the other solutions.

Options 2 and 3

dplyr::select(df, !!cols_to_select)
dplyr::select(df, !!!cols_to_select)

These two solutions provide the same results in this case. You can see the output of !!cols_to_select and !!!cols_to_select by doing:

dput(rlang::`!!`(cols_to_select)) # c("b", "d")
dput(rlang::`!!!`(cols_to_select)) # pairlist("b", "d")

The !! or UQ() operator evaluates its argument immediately in its context, and that is what you want.

The !!! or UQS() operator are used to pass multiple arguments at once to a function.

For character column names like in your example it does not matter if you give them as a single vector of length 2 (using !!) or as a list with two vectors of length one (using !!!). For more complex use cases you will need to use multiple arguments as a list: (using !!!)

a <- quos(contains("c"), dplyr::starts_with("b"))
dplyr::select(df, !!a) # does not work
dplyr::select(df, !!!a) # does work
like image 143
zeehio Avatar answered Oct 07 '22 08:10

zeehio