Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

dplyr: select all variables except for those contained in vector

Tags:

This should be a simple issue but I am struggling.

I have a vector of variable names that I want to exclude from a data frame:

df <- data.frame(matrix(rexp(50), nrow = 10, ncol = 5))
names(df) <- paste0(rep("variable_", 5), 1:5)

excluded_vars <- c("variable_1", "variable_3")

I would have thought that just excluding the object in the select statement with - would have worked:

select(df, -excluded_vars)

But I get the following error:

Error in -excluded_vars : invalid argument to unary operator

the same is true when using select_()

Any ideas?

like image 345
Shinobi_Atobe Avatar asked Mar 27 '18 14:03

Shinobi_Atobe


People also ask

What does %>% do in dplyr?

%>% is called the forward pipe operator in R. It provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. It is defined by the package magrittr (CRAN) and is heavily used by dplyr (CRAN).

How do I select variables in dplyr?

How To Select A Variable by name with dplyr select()? We can select a variable from a data frame using select() function in two ways. One way is to specify the dataframe name and the variable/column name we want to select as arguments to select() function in dplyr.

Which of the following functions in dplyr package can be used to choose variables using their names?

select() and rename(): For choosing variables and using their names as a base for doing so.

How to select variables or columns in R using dplyr library?

In this article, we are going to select variables or columns in R programming language using dplyr library. Here, data frame is the input dataframe and columns are the columns in the dataframe to be displayed We can also use the column position and get the column using select () method.

How do I use the selection language in dplyr?

The selection language can be used in functions like dplyr::select () or tidyr::pivot_longer (). Let's first attach the tidyverse: Select variables by name: Select multiple variables by separating them with commas.

What is the difference between select_ and one_of in dplyr?

The call to one_of is what evaluates your argument. As of a more recent version of dplyr, the following now works: select (df, -any_of (excluded_vars)) is now the safest way to do this (the code will not break if a variable name that doesn't exist in df is included in excluded_vars) With select_, you could simply use setdiff.

What are tidyverse selections in R?

Tidyverse selections implement a dialect of R where operators make it easy to select variables: : for selecting a range of consecutive variables. ! for taking the complement of a set of variables. & and | for selecting the intersection or the union of two sets of variables.


2 Answers

You need to use the one_of function:

select(df, -one_of(excluded_vars))

See the section on Useful Functions in the dplyr documentation for select for more about selecting based on variable names.

like image 146
C. Braun Avatar answered Sep 22 '22 15:09

C. Braun


As of a more recent version of dplyr, the following now works:

select(df, -excluded_vars)
like image 45
Shinobi_Atobe Avatar answered Sep 18 '22 15:09

Shinobi_Atobe