Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to pass multiple necessary patterns to str_subset?

Tags:

regex

r

stringr

I am trying to find elements in a character vector that match two words in no particular order, not just any single one of them, using the stringr::str_subset function. In other words, I'm looking for the intersection, not the union of the two words.

I tried using the "or" (|) operator but this only gives me either one of the two words and returns too many results. I also tried just passing a character vector with the two words as the pattern argument. This just returns the error that "longer object length is not a multiple of shorter object length" and only returns the values that match the second one of the two words.

character_vector <- c("abc ghi jkl mno def", "pqr abc def", "abc jkl pqr")
pattern <- c("def", "pqr")

str_subset(character_vector, pattern)

I'm looking for the pattern that will return only the second element of the character vector, i.e. "pqr abc def".

like image 259
Tea Tree Avatar asked Jan 27 '23 07:01

Tea Tree


2 Answers

An option is str_detect. Loop over the 'pattern', check if both the 'pattern' elements match with the 'character_vector' (&), use the logical vector to extract the element from the 'character_vector'

library(tidyverse)
map(pattern, str_detect, string = character_vector) %>%
    reduce(`&`) %>% 
    magrittr::extract(character_vector, .)
#[1] "pqr abc def"

Or using str_subset

map(pattern, str_subset, string = character_vector) %>% 
         reduce(intersect)
#[1] "pqr abc def"
like image 103
akrun Avatar answered Jan 31 '23 10:01

akrun


You can use a pure R code with out a loop using regular expression. The code is like this:

character_vector[grepl(paste0("(?=.*",pattern,")",collapse = ""), character_vector, perl = TRUE)]

the grepl would find the position of the character that full fills the regex and condition inside the paste0.

like image 39
Alejandro Andrade Avatar answered Jan 31 '23 10:01

Alejandro Andrade