This should be a simple issue but I am struggling.
I have a vector of variable names that I want to exclude from a data frame:
df <- data.frame(matrix(rexp(50), nrow = 10, ncol = 5))
names(df) <- paste0(rep("variable_", 5), 1:5)
excluded_vars <- c("variable_1", "variable_3")
I would have thought that just excluding the object in the select statement with -
would have worked:
select(df, -excluded_vars)
But I get the following error:
Error in -excluded_vars : invalid argument to unary operator
the same is true when using select_()
Any ideas?
%>% is called the forward pipe operator in R. It provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. It is defined by the package magrittr (CRAN) and is heavily used by dplyr (CRAN).
How To Select A Variable by name with dplyr select()? We can select a variable from a data frame using select() function in two ways. One way is to specify the dataframe name and the variable/column name we want to select as arguments to select() function in dplyr.
select() and rename(): For choosing variables and using their names as a base for doing so.
In this article, we are going to select variables or columns in R programming language using dplyr library. Here, data frame is the input dataframe and columns are the columns in the dataframe to be displayed We can also use the column position and get the column using select () method.
The selection language can be used in functions like dplyr::select () or tidyr::pivot_longer (). Let's first attach the tidyverse: Select variables by name: Select multiple variables by separating them with commas.
The call to one_of is what evaluates your argument. As of a more recent version of dplyr, the following now works: select (df, -any_of (excluded_vars)) is now the safest way to do this (the code will not break if a variable name that doesn't exist in df is included in excluded_vars) With select_, you could simply use setdiff.
Tidyverse selections implement a dialect of R where operators make it easy to select variables: : for selecting a range of consecutive variables. ! for taking the complement of a set of variables. & and | for selecting the intersection or the union of two sets of variables.
You need to use the one_of
function:
select(df, -one_of(excluded_vars))
See the section on Useful Functions in the dplyr
documentation for select for more about selecting based on variable names.
As of a more recent version of dplyr, the following now works:
select(df, -excluded_vars)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With