Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

detect string with both AND and OR boolean operator in R

Tags:

regex

r

stringr

I have a text like this:

text = 'I love apple, pear, grape and peach'

If I want to know if the text contain either apple or pear. I can do the following and works fine:

str_detect(text,"apple|pear")
[1] TRUE

my question is what if I want to use boolean like this (apple OR pear) AND (grape). Is there anyway that I can put it in str_detect(). Is that possible? The following is NOT working:

str_detect(text,"(apple|pear) & (grape)" )
[1] FALSE

The reason I want to know this is I want to program to convert a 'boolean query' and feed into the grep or str_detect. something like:

str_detect(text, '(word1|word2) AND (word2|word3|word4) AND (word5|word6) AND .....')

The number of AND varies....

No solution with multiple str_detect please.

like image 914
zesla Avatar asked Sep 17 '19 18:09

zesla


People also ask

What is Stringr in R?

The stringr package provide a cohesive set of functions designed to make working with strings as easy as possible. If you're not familiar with strings, the best place to start is the chapter on strings in R for Data Science.

What is regex R?

Details. A 'regular expression' is a pattern that describes a set of strings. Two types of regular expressions are used in R, extended regular expressions (the default) and Perl-like regular expressions used by perl = TRUE . There is also fixed = TRUE which can be considered to use a literal regular expression.


1 Answers

You can pass all the patterns to str_detect as a vector and check that they're all TRUE with all.

patterns <- c('apple|pear', 'grape')
all(str_detect(text, patterns))

Or with base R

all(sapply(patterns, grepl, x = text))

Or, you could put the patterns in a list and use map, which would give more detailed output for the ORs (or anything else you may want to put as a list element)

patterns <- list(c('apple', 'pear'), 'peach')
patterns %>% 
  map(str_detect, string = text)

# [[1]]
# [1] TRUE TRUE
# 
# [[2]]
# [1] TRUE

It's also possible to write it as a single regular expression, but I see no reason to do this

patterns <- c('apple|pear', 'grape')
patt_combined <- paste(paste0('(?=.*', patterns, ')'), collapse = '')
str_detect(text, patt_combined)

patt_combined is

# [1] "(?=.*apple|pear)(?=.*grape)"
like image 153
IceCreamToucan Avatar answered Sep 28 '22 16:09

IceCreamToucan