Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R regex to find two words same string, order and distance may vary

Tags:

regex

r

I want to create a single regex (if possible) to search through strings and determine if two words occur in the same string. I know I can use two grepl statements (as seen below) but am wanting to use a single regex to test for this condition. The more efficient the regex the better.

I want to find strings that contain both "man" and "dog" case insensitive.

x <- c(
    "The dog and the man play in the park.",
    "The man plays with the dog.",
    "That is the man's hat.",
    "Man I love that dog!",
    "I'm dog tired"
)

## this works but I want a single regex
grepl("dog", x, ignore.case=TRUE)  & grepl("man", x, ignore.case=TRUE) 
like image 271
Tyler Rinker Avatar asked Sep 23 '15 13:09

Tyler Rinker


1 Answers

Use regex alternation operator |.

grepl(".*(dog.*man|man.*dog).*", x, ignore.case=TRUE)

Use word boundaries if necessary..

grepl(".*(\\bdog\\b.*\\bman\\b|\\bman\\b.*\\bdog\\b).*", x, ignore.case=TRUE)

No need for leading and trailing .*

grepl("(dog.*man|man.*dog)", x, ignore.case=TRUE)

You may give the case-insensitive modifier within the regex itself.

grepl("(?i)(dog.*man|man.*dog)", x)
like image 142
Avinash Raj Avatar answered Nov 11 '22 12:11

Avinash Raj