Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use regular expressions with dplyr's select helper functions

Tags:

regex

r

dplyr

It is straightforward to use dplyr to select columns using various helper functions, such as contains(). In the help file for these functions the argument is referred to as a 'literal string'. However, is it possible to use regular expressions instead?

The following example works:

library(dplyr)
iris %>%
   select(contains("Species"))

The following regex example does not:

# Select all column names that end with lower case "s"
iris %>%
   select(contains("s$"))

# Not run
data frame with 0 columns and 150 rows

I would like to know if using regular expressions in dplyr select helper functions is possible and, if so, their implementation.

If this isn't possible, I will except an answer using an alternative method (e.g., base or data.table). For background, my ultimate aim is to use a summarise_at() function or equivalent to sum all columns that end in a number (i.e, regexp [0-9]$).

like image 343
RDavey Avatar asked Aug 22 '19 08:08

RDavey


People also ask

How do you select multiple variables in R?

To pick out single or multiple columns use the select() function. The select() function expects a dataframe as it's first input ('argument', in R language), followed by the names of the columns you want to extract with a comma between each name.

What is regular expression function?

A regular expression lets you perform pattern matching on strings of characters. The regular expression syntax allows you to precisely define the pattern used to match strings, giving you much greater control than wildcard matching used in the LIKE predicate.


2 Answers

The select helper function matches() is available to match regular expressions:

library(dplyr)

out <- select(iris, matches("s$"))

head(out)
#>   Species
#> 1  setosa
#> 2  setosa
#> 3  setosa
#> 4  setosa
#> 5  setosa
#> 6  setosa
like image 99
Joris C. Avatar answered Nov 15 '22 04:11

Joris C.


With dplyr, one can use ends_with:

iris %>% 
  select(ends_with("s")) %>% 
   head(3)
  Species
1  setosa
2  setosa
3  setosa

With base and grepl:

head(iris[grepl("s$",names(iris),ignore.case = FALSE)])
  Species
1  setosa
2  setosa
3  setosa
4  setosa
5  setosa
6  setosa

Or using purrr:

iris %>% 
   purrr::keep(grepl("s$",names(.))) %>% 
   head()
  Species
1  setosa
2  setosa
3  setosa
4  setosa
5  setosa
6  setosa
like image 45
NelsonGon Avatar answered Nov 15 '22 03:11

NelsonGon