Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using dplyr's select where variable names are quoted [duplicate]

Tags:

r

dplyr

Often I'll want to select a subset of variables where the subset is the result of a function. In this simple case, I first get all the variable names which pertain to width characteristics

library(dplyr)
library(magrittr)

data(iris)

width.vars <- iris %>% 
                names %>% 
                extract(grep(".Width", .))

Which returns:

>width.vars
 [1] "Sepal.Width" "Petal.Width"

It would be useful to be able to use these returns as a way to select columns (and while I'm aware that contains() and its siblings exist, there are plenty of more complicated subsets I would like to perform, and this example is made trivial for the purpose of this example.

If I was to attempt to use this function as a way to select columns, the following happens:

iris %>% 
  select(Species,
         width.vars)

Error: All select() inputs must resolve to integer column positions.
The following do not:
*  width.vars

How can I use dplyr::select with a vector of variable names stored as strings?

like image 302
tomw Avatar asked Oct 22 '15 15:10

tomw


People also ask

What does the select function do in R?

The select() function is used to pick specific variables or features of a DataFrame or tibble. It selects columns based on provided conditions like contains, matches, starts with, ends with, and so on.

How do I select multiple columns by name in R?

To pick out single or multiple columns use the select() function. The select() function expects a dataframe as it's first input ('argument', in R language), followed by the names of the columns you want to extract with a comma between each name.

How do I select all variables in R?

everything() selects all variable. It is also useful in combination with other tidyselect operators. last_col() selects the last variable.


2 Answers

Within dplyr, most commands have an alternate version that ends with a '_' that accept strings as input; in this case, select_. These are typically what you have to use when you are utilizing dplyr programmatically.

iris %>% select_(.dots=c("Species",width.vars))
like image 194
Craig Avatar answered Oct 21 '22 00:10

Craig


First of all, you can do the selection in dplyr with

iris %>% select(Species, contains(".Width"))

No need to create the vector of names separately. But if you did have a list of columns as string names, you could do

width.vars <- c("Sepal.Width", "Petal.Width")
iris %>% select(Species, one_of(width.vars))

See the ?select help page for all the available options.

like image 23
MrFlick Avatar answered Oct 21 '22 02:10

MrFlick