I am trying to unselect columns in my dataset using dplyr, but I am not able to achieve that since last night.
I am well aware of work around but I am being strictly trying to find answer just through dplyr.
library(dplyr)
df <- tibble(x = c(1,2,3,4), y = c('a','b','c','d'))
df %>% select(-c('x'))
Gives me an error : Error in -c("x") : invalid argument to unary operator
Now, I know that select takes in unquoted values but I am not able to sub-select in this fashion.
Please note the above dataset is just an example, we can have many columns.
Thanks,
Prerit
To pick out single or multiple columns use the select() function. The select() function expects a dataframe as it's first input ('argument', in R language), followed by the names of the columns you want to extract with a comma between each name.
Edit: OP's actual question was about how to use a character vector to select or deselect columns from a dataframe. Use the one_of()
helper function for that:
colnames(iris)
# [1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species"
cols <- c("Petal.Length", "Sepal.Length")
select(iris, one_of(cols)) %>% colnames
# [1] "Petal.Length" "Sepal.Length"
select(iris, -one_of(cols)) %>% colnames
# [1] "Sepal.Width" "Petal.Width" "Species"
You should have a look at the select helpers (type ?select_helpers
) because they're incredibly useful. From the docs:
starts_with()
: starts with a prefix
ends_with()
: ends with a prefix
contains()
: contains a literal string
matches()
: matches a regular expression
num_range()
: a numerical range like x01, x02, x03.
one_of()
: variables in character vector.
everything()
: all variables.
Given a dataframe with columns names a:z, use select
like this:
select(-a, -b, -c, -d, -e)
# OR
select(-c(a, b, c, d, e))
# OR
select(-(a:e))
# OR if you want to keep b
select(-a, -(c:e))
# OR a different way to keep b, by just putting it back in
select(-(a:e), b)
So if I wanted to omit two of the columns from the iris
dataset, I could say:
colnames(iris)
# [1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species"
select(iris, -c(Sepal.Length, Petal.Length)) %>% colnames()
# [1] "Sepal.Width" "Petal.Width" "Species"
But of course, the best and most concise way to achieve that is using one of select
's helper functions:
select(iris, -ends_with(".Length")) %>% colnames()
# [1] "Sepal.Width" "Petal.Width" "Species"
P.S. It's weird that you are passing quoted values to dplyr
, one of its big niceties is that you don't have to keep typing out quotes all the time. As you can see, bare values work fine with dplyr
and ggplot2
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With