Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular Expression to MATCH ALL words in a query, in any order

I'm trying to build a search feature for a project which narrows down items based on a user search input and if it matches the keywords listed against items. For this, I'm saving the item keywords in a data attribute and matching the query with these keywords using a RegExp pattern.

I'm currently using this expression, which I know is not correct and need your help on that:

new RegExp('\\b(' + query + ')', 'gi'))) where query is | separated values of the query entered by the user (e.g. \\b(meat|pasta|dinner)). This returns me a match even if there is only 1 match, say for example - meat

Just to throw some context, here's a small example:

If a user types: meat pasta dinner it should list all items which have ALL the 3 keywords listed against them i.e. meat pasta and dinner. These are independent of the order they're typed in.

Can you help me with an expression which will match ALL words in a query, in any order?

like image 670
kayen Avatar asked Dec 17 '12 09:12

kayen


People also ask

How do you regex multiple words?

However, to recognize multiple words in any order using regex, I'd suggest the use of quantifier in regex: (\b(james|jack)\b. *){2,} . Unlike lookaround or mode modifier, this works in most regex flavours.

How do I match a pattern in regex?

Most characters, including all letters ( a-z and A-Z ) and digits ( 0-9 ), match itself. For example, the regex x matches substring "x" ; z matches "z" ; and 9 matches "9" . Non-alphanumeric characters without special meaning in regex also matches itself. For example, = matches "=" ; @ matches "@" .

How do I find all matches in regex?

To find find all the matches of a regular expression in this string in JavaScript, call match() method on this string, and pass the regular expression as argument. match() method returns an array of strings containing all the matches found for the given regular expression, in this string.


2 Answers

You can achieve this will lookahead assertions

^(?=.*\bmeat\b)(?=.*\bpasta\b)(?=.*\bdinner\b).+

See it here on Regexr

(?=.*\bmeat\b) is a positive lookahead assertion, that ensures that \bmeat\b is somewhere in the string. Same for the other keywords and the .+ is then actually matching the whole string, but only if the assertions are true.

But it will match also on "dinner meat Foobar pasta"

like image 120
stema Avatar answered Oct 19 '22 10:10

stema


stema's answer is technically correct, but it doesn't take performance into account at all. Look aheads are extremely slow (in the context of regular expressions, which are lightning fast). Even with the current logic, the regular expression is not optimal.

So here are some measurements, calculated on larger strings which contain all three words, running the search 1000 times and using four different approaches:

stema's regular expression

/^(?=.*\bmeat\b)(?=.*\bpasta\b)(?=.*\bdinner\b).+/

result: 605ms

optimized regular expression

/^(?=.*?\bmeat\b)(?=.*?\bpasta\b)(?=.*?\bdinner\b)/

uses lazy matching and doesn't need the end all selector

result: 291ms

permutation regular expression

/(\bmeat\b.*?(\bpasta\b.*?\bdinner\b|\bdinner\b.*?\bpasta\b)|\bpasta\b.*?(\bmeat\b.*?\bdinner\b|\bdinner\b.*?\bmeat\b)|\bdinner\b.*?(\bpasta\b.*?\bmeat\b|\bmeat\b.*?\bpasta\b))/

result: 56ms

this is fast because the first pattern is matching, if the last pattern matched, it would be even slower than the look ahead one (300 ms)

array of regular expressions

var regs=[/\bmeat\b/,/\bpasta\b/,/\bdinner\b/];
var result = regs.every(reg=>reg.test(text));

result: 26ms

Note that if the strings are crafted to not match, then the results are:

  • 521ms
  • 220ms
  • 161ms - much slower because it has to go through all the branches
  • 14ms

As you can see, in all cases just using a loop is an order of magnitude faster, not to mention easier to read.

The original question was asking for a regular expression, so my answer to that is the permutation regular expression, but I would not use it, as its size would grow exponentially with the number of search words.

Also, in most cases this performance issue is academic, but it is necessary to be highlighted.

like image 7
Siderite Zackwehdex Avatar answered Oct 19 '22 10:10

Siderite Zackwehdex