Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex unordered matches

Tags:

java

regex

This feels like it should be an extremely simple thing to do with regex but I can't quite seem to figure it out.

I would like to write a regex which checks to see if a list of certain words appear in a document, in any order, along with any of a set of other words in any order.

In boolean logic the check would be: If allOfTheseWords are in this text and atLeastOneOfTheseWords are in this text, return true.

Example
I'm searching for (john and barbara) with (happy or sad). Order does not matter.

"Happy birthday john from barbara" => VALID
"Happy birthday john"              => INVALID

I simply cannot figure out how to get the and part to match in an orderless way, any help would be appreciated!

like image 272
mrcleaver Avatar asked Jul 07 '11 21:07

mrcleaver


2 Answers

You don't really want to use a regex for this unless the text is very small, which from your description I doubt.

A simple solution would be to dump all the words into a HashSet, at which point checking to see if a word is present becomes a very quick and easy operation.

like image 106
Michael Myers Avatar answered Sep 19 '22 02:09

Michael Myers


If you want to do it with regex, I'd try positive lookahead:

// searching for (john and barbara) with (happy or sad)
"^(?=.*\bjohn\b)(?=.*\bbarbara\b).*\b(happy|sad)\b"

The performance should be comparable to doing a full text search for each of the words in the allOfTheseWords group separately.

like image 42
Christian Semrau Avatar answered Sep 19 '22 02:09

Christian Semrau