I want to select multiple columns based on their names with a regex expression. I am trying to do it with the piping syntax of the dplyr
package. I checked the other topics, but only found answers about a single string.
With base R:
library(dplyr)
mtcars[grepl('m|ar', names(mtcars))]
### mpg am gear carb
### Mazda RX4 21.0 1 4 4
### Mazda RX4 Wag 21.0 1 4 4
However it doesn't work with the select/contains way:
mtcars %>% select(contains('m|ar'))
### data frame with 0 columns and 32 rows
What's wrong?
To pick out single or multiple columns use the select() function. The select() function expects a dataframe as it's first input ('argument', in R language), followed by the names of the columns you want to extract with a comma between each name.
We can select a variable from a data frame using select() function in two ways. One way is to specify the dataframe name and the variable/column name we want to select as arguments to select() function in dplyr. In this example below, we select species column from penguins data frame.
You can use matches
mtcars %>%
select(matches('m|ar')) %>%
head(2)
# mpg am gear carb
#Mazda RX4 21 1 4 4
#Mazda RX4 Wag 21 1 4 4
According to the ?select
documentation
‘matches(x, ignore.case = TRUE)’: selects all variables whose name matches the regular expression ‘x’
Though contains
work with a single string
mtcars %>%
select(contains('m'))
You can use contains
from package dplyr
, if you give a vector of text options, like this:
mtcars %>%
select(contains(c("m", "ar"))
You could still use grepl() from base R.
df <- mtcars[ , grepl('m|ar', names(mtcars))]
...which returns a subset dataframe, df
, containing columns with m
or ar
in the column names
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With