Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

dplyr / tidy way to filter a vector based on a substring?

Tags:

r

dplyr

We can see some good examples of how to filter a data.frame based on a substring; is there a tidy way of doing this for a vector? (that is, without using grepl() or similar)

Example

I tried what would work on a data.frame

# Leave only words that don't begin with 'cat'

vec <- c("cat", "catamaran", "dog", "mouse", "catacombs")

vec %>% filter(substr(1, 3) != "cat") # %>% ... etc

but

Error in UseMethod("filter_") : 
  no applicable method for 'filter_' applied to an object of class "character"

Note

We could use something like vec %>% { .[!grepl("cat", .)] }, or more accurately vec %>% { .[substr(., 1, 3) != "cat"]}, but I will try to find something that..

  1. is more beginner friendly, with more verbally descriptive functions (e.g. a complete novice can probably guess what 'filter' does but possibly not 'grepl')
  2. has less finicky syntax (as few { and } as possible)
  3. pipes more elegantly (e.g. vec %>% filter(...) %>% next operations)
  4. contains as little repetition as possible, noting that the grepl way uses the original vector (denoted by .) twice (as opposed to just once which would be ideal)
like image 260
stevec Avatar asked Jan 04 '20 06:01

stevec


2 Answers

I think tidyverse is more suitable for dataframes/lists and not for vectors. Pipes are needed if you want to perform more than one operation but here you can get the expected result using a single function (grep) without any need for pipes.

grep('^cat', vec, value = TRUE, invert = TRUE)
#[1] "dog"   "mouse"

Or maybe convert the vector to dataframe and then use either of

library(dplyr)
library(tibble)

vec %>% enframe() %>% filter(!startsWith(value, 'cat'))

Or

vec %>% enframe() %>% filter_at(vars(value), any_vars(!startsWith(., 'cat')))
like image 67
Ronak Shah Avatar answered Oct 14 '22 08:10

Ronak Shah


If you don't mind using a different package, you can use the stri_detect_fixed function from the stringi package.

install.packages('stringi')
library(stringi)

vec <- c("cat", "catamaran", "dog", "mouse", "catacombs")
vec[stri_detect_fixed(vec, 'cat')]

Output:

[1] "cat"       "catamaran" "catacombs"

You should then be able to pipe this to what ever commands you would like.

like image 28
bgaerber Avatar answered Oct 14 '22 08:10

bgaerber