I am trying to find elements in a character vector that match two words in no particular order, not just any single one of them, using the stringr::str_subset function. In other words, I'm looking for the intersection, not the union of the two words.
I tried using the "or" (|) operator but this only gives me either one of the two words and returns too many results. I also tried just passing a character vector with the two words as the pattern argument. This just returns the error that "longer object length is not a multiple of shorter object length" and only returns the values that match the second one of the two words.
character_vector <- c("abc ghi jkl mno def", "pqr abc def", "abc jkl pqr")
pattern <- c("def", "pqr")
str_subset(character_vector, pattern)
I'm looking for the pattern that will return only the second element of the character vector, i.e. "pqr abc def".
An option is str_detect
. Loop over the 'pattern', check if both the 'pattern' elements match with the 'character_vector' (&
), use the logical vector to extract
the element from the 'character_vector'
library(tidyverse)
map(pattern, str_detect, string = character_vector) %>%
reduce(`&`) %>%
magrittr::extract(character_vector, .)
#[1] "pqr abc def"
Or using str_subset
map(pattern, str_subset, string = character_vector) %>%
reduce(intersect)
#[1] "pqr abc def"
You can use a pure R code with out a loop using regular expression. The code is like this:
character_vector[grepl(paste0("(?=.*",pattern,")",collapse = ""), character_vector, perl = TRUE)]
the grepl
would find the position of the character that full fills the regex and condition inside the paste0.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With