Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

select columns based on multiple strings with dplyr contains()

I want to select multiple columns based on their names with a regex expression. I am trying to do it with the piping syntax of the dplyr package. I checked the other topics, but only found answers about a single string.

With base R:

library(dplyr)    
mtcars[grepl('m|ar', names(mtcars))]
###                      mpg am gear carb
### Mazda RX4           21.0  1    4    4
### Mazda RX4 Wag       21.0  1    4    4

However it doesn't work with the select/contains way:

mtcars %>% select(contains('m|ar'))
### data frame with 0 columns and 32 rows

What's wrong?

like image 279
agenis Avatar asked Mar 12 '15 19:03

agenis


People also ask

How do I select multiple columns in R studio?

To pick out single or multiple columns use the select() function. The select() function expects a dataframe as it's first input ('argument', in R language), followed by the names of the columns you want to extract with a comma between each name.

How do I use the select function in Dplyr?

We can select a variable from a data frame using select() function in two ways. One way is to specify the dataframe name and the variable/column name we want to select as arguments to select() function in dplyr. In this example below, we select species column from penguins data frame.


3 Answers

You can use matches

 mtcars %>%
        select(matches('m|ar')) %>%
        head(2)
 #              mpg am gear carb
 #Mazda RX4      21  1    4    4
 #Mazda RX4 Wag  21  1    4    4

According to the ?select documentation

‘matches(x, ignore.case = TRUE)’: selects all variables whose name matches the regular expression ‘x’

Though contains work with a single string

mtcars %>% 
       select(contains('m'))
like image 96
akrun Avatar answered Oct 13 '22 18:10

akrun


You can use contains from package dplyr, if you give a vector of text options, like this:

mtcars %>% 
       select(contains(c("m", "ar"))
like image 28
Nicki Norris Avatar answered Oct 13 '22 17:10

Nicki Norris


You could still use grepl() from base R.

df <- mtcars[ , grepl('m|ar', names(mtcars))]

...which returns a subset dataframe, df, containing columns with m or ar in the column names

like image 3
Muthoni Thiong'o Avatar answered Oct 13 '22 19:10

Muthoni Thiong'o