I have List collection with around 35,000 strings
Typical string looks like this:
"<i>füüs</i>ampri tähis;lüh ld-st<i>anno</i>, aastal;<i>maj</i> lüh pr-st<i>argent</i>, raha (kursisedelitel)"
Basically this string contains bunch of words in Estonian :)
I need to allow user to perform RegExp search on 35,000 strings
If I perform search using /ab.*/ expression, then search takes less than a second
If I perform search using /.*ab/ expression, then search takes around 10 seconds
My question is: How can I make second search faster (less then 1.5 seconds)?
Thank You very much
It’s how regular expressions are processed that makes them perform so different. To explain that based on your examples:
/.*ab/ This expression consists on two sub-expressions, the .* and the literal ab. This would be processed as follows: In the normal greedy mode, where every quantor and thus the match is expanded to its maximum, the .* will first match the whole string. Then it will try to match the following ab. But as it is not possible (we’re already at the end of the string) backtracking will be used to find the last point where both sub-expressions match. So the match of .* is reduced by one character and again the ab is tested. If the whole expression cannot be matched, the match of .* again will be reduced by one character until the whole expression is matched. In the worst case there is no ab in the string and the algorithm will do n+1 backtracks and additional tests for ab until it can determine that a match is impossible.
/ab.*/ This expression consists of two sub-expressions too. But here the order is changed, first the literal ab and than the .*. This is processed as follows: The algorithm first tries to find the literal ab by comparing one character by another. If there is a match, it tries to find a match for .* that is obvious easy.
The main difference between those two regular expressions is, that the second has the static part at the beginning and the variable part at the end. This makes no backtracking necessary.
Try some regular expression analysis tool such as RegexBuddy to see the difference visually.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With