Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How NOT to select columns using select() dplyr when you have character vector of colnames?

Tags:

r

dplyr

I am trying to unselect columns in my dataset using dplyr, but I am not able to achieve that since last night.

I am well aware of work around but I am being strictly trying to find answer just through dplyr.

library(dplyr)
df <- tibble(x = c(1,2,3,4), y = c('a','b','c','d'))
df %>% select(-c('x'))

Gives me an error : Error in -c("x") : invalid argument to unary operator

Now, I know that select takes in unquoted values but I am not able to sub-select in this fashion.

Please note the above dataset is just an example, we can have many columns.

Thanks,

Prerit

like image 500
Slayer Avatar asked Mar 30 '18 23:03

Slayer


People also ask

How do I select multiple columns in a Dataframe in R?

To pick out single or multiple columns use the select() function. The select() function expects a dataframe as it's first input ('argument', in R language), followed by the names of the columns you want to extract with a comma between each name.


1 Answers

Edit: OP's actual question was about how to use a character vector to select or deselect columns from a dataframe. Use the one_of() helper function for that:

colnames(iris)

# [1] "Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width"  "Species"

cols <- c("Petal.Length", "Sepal.Length")

select(iris, one_of(cols)) %>% colnames

# [1] "Petal.Length" "Sepal.Length"

select(iris, -one_of(cols)) %>% colnames

# [1] "Sepal.Width" "Petal.Width" "Species"

You should have a look at the select helpers (type ?select_helpers) because they're incredibly useful. From the docs:

starts_with(): starts with a prefix

ends_with(): ends with a prefix

contains(): contains a literal string

matches(): matches a regular expression

num_range(): a numerical range like x01, x02, x03.

one_of(): variables in character vector.

everything(): all variables.


Given a dataframe with columns names a:z, use select like this:

select(-a, -b, -c, -d, -e)

# OR

select(-c(a, b, c, d, e))

# OR

select(-(a:e))

# OR if you want to keep b

select(-a, -(c:e))

# OR a different way to keep b, by just putting it back in

select(-(a:e), b)

So if I wanted to omit two of the columns from the iris dataset, I could say:

colnames(iris)

# [1] "Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width"  "Species"

select(iris, -c(Sepal.Length, Petal.Length)) %>% colnames()

# [1] "Sepal.Width" "Petal.Width" "Species" 

But of course, the best and most concise way to achieve that is using one of select's helper functions:

select(iris, -ends_with(".Length")) %>% colnames()

# [1] "Sepal.Width" "Petal.Width" "Species"   

P.S. It's weird that you are passing quoted values to dplyr, one of its big niceties is that you don't have to keep typing out quotes all the time. As you can see, bare values work fine with dplyr and ggplot2.

like image 192
DuckPyjamas Avatar answered Oct 18 '22 19:10

DuckPyjamas