How do I write a regular expression to match two given strings, at any position in the string?
For example, if I am searching for cat
and mat
, it should match:
The cat slept on the mat in front of the fire. At 5:00 pm, I found the cat scratching the wool off the mat.
No matter what precedes these strings.
However, to recognize multiple words in any order using regex, I'd suggest the use of quantifier in regex: (\b(james|jack)\b. *){2,} . Unlike lookaround or mode modifier, this works in most regex flavours.
Literal Characters and Sequences For instance, you might need to search for a dollar sign ("$") as part of a price list, or in a computer program as part of a variable name. Since the dollar sign is a metacharacter which means "end of line" in regex, you must escape it with a backslash to use it literally.
/^.*?\bcat\b.*?\bmat\b.*?$/m
Using the m
modifier (which ensures the beginning/end metacharacters match on line breaks rather than at the very beginning and end of the string):
^
matches the line beginning.*?
matches anything on the line before...\b
matches a word boundary the first occurrence of a word boundary (as @codaddict discussed)cat
and another word boundary; note that underscores are treated as "word" characters, so _cat_
would not match*;.*?
: any characters before...mat
, boundary.*?
: any remaining characters before...$
: the end of the line.It's important to use \b
to ensure the specified words aren't part of longer words, and it's important to use non-greedy wildcards (.*?
) versus greedy (.*
) because the latter would fail on strings like "There is a cat on top of the mat which is under the cat." (It would match the last occurrence of "cat" rather than the first.)
* If you want to be able to match _cat_
, you can use:
/^.*?(?:\b|_)cat(?:\b|_).*?(?:\b|_)mat(?:\b|_).*?$/m
which matches either underscores or word boundaries around the specified words. (?:)
indicates a non-capturing group, which can help with performance or avoid conflicted captures.
Edit: A question was raised in the comments about whether the solution would work for phrases rather than just words. The answer is, absolutely yes. The following would match "A line which includes both the first phrase and the second phrase":
/^.*?(?:\b|_)first phrase here(?:\b|_).*?(?:\b|_)second phrase here(?:\b|_).*?$/m
Edit 2: If order doesn't matter you can use:
/^.*?(?:\b|_)(first(?:\b|_).*?(?:\b|_)second|second(?:\b|_).*?(?:\b|_)first)(?:\b|_).*?$/m
And if performance is really an issue here, it's possible lookaround (if your regex engine supports it) might (but probably won't) perform better than the above, but I'll leave both the arguably more complex lookaround version and performance testing as an exercise to the questioner/reader.
Edited per @Alan Moore's comment. I didn't have a chance to test it, but I'll take your word for it.
(.* word1.* word2.* )|(.* word2.* word1.*)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With