Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does Java regex optimize this specific case?

Tags:

java

regex

I wonder how does regex work, my particular regex has an element that looks like this:

(word1|word2|wordn......)

The numbers of words is big several hundreds.
I wonder if the regex engine is just testing the words one by one or if it optimizes the search and it what way.
Any pointer to good documentation will be good.

like image 232
millebii Avatar asked Nov 06 '22 03:11

millebii


1 Answers

If you have several hundred words, you need to beware of the ordering of the words in the regex. The regex engine looks for the words from left to right.
If you test the word setValue against the alternation set|setValue, it will match only the 3 letters comprising "set", and not the whole string.

See this link (from www.regular-expressions.info) for the full explanation.

I don't think that the regex engine truly optimizes alternations (i.e., analyzing common prefixes and building nfa accordingly). Therefore, with so many words, I don't think it will be an optimization.

Aside from re-ordering the words, you can also try adding word or line boundary after the alternation, e.g. (set|setValue)$, but I suspect that the regex engine will do a lot of backtracking so it may not be worth the effort.

like image 159
Yoni Avatar answered Nov 13 '22 16:11

Yoni