Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why do these regular expressions execute slowly in Java?

I am trying to use regular expressions to determine what format the user have applied when entering input in a textbox.
The regular expressions are as follows:

(\\s?[" + alphabet + "]{9,9})+

To determine whether the input is one or more strings of length 9 in a given alphabet, possibly separated by whitespace.

(>[\\w\\s]+\\n[" + alphabet + "\\s]+)+

To check if the input is in FASTA format

The regular expressions run terribly slow when matching with inputString.matches(regexString). Why is this?

I figured this may be due to Java storing all potential matches (which I don't need at this point), but adding ?: in every parenthesis breaks the regex. How should this be done?

Thank you,

Martin

Edit 1: I was unable to reproduce this issue - it only happens on one computer. This could suggest something wrong with that particular VM setup.
We need something more robust, and so we will be implementing this differently. I have picked Joel's answer as the right one, since I believe that some special case in Pattern may be the cause.

like image 765
Martin Wiboe Avatar asked Jun 27 '10 13:06

Martin Wiboe


2 Answers

string.matches() compile the regex every time you do it. Instead, look at the Pattern/Matcher classes, which allow you to cache precompiled regexes.

Another thing is to use non-capturing regex groups if you don't need the result of the matching.

like image 135
ddimitrov Avatar answered Nov 07 '22 12:11

ddimitrov


this might not explain your particular problem. but once I dived into JDK's regex implementation, and I was surprised at how unsophisticated it is. it doesn't really build a state machine that advances at each input char. I assume they have their reasons.

in your case, it is so easy to write a parse by yourself, by hand. people fear to do that, it seems "dumb" to manually code these tiny steps, and people think established libraries must be doing some splendid tricks to outperform home grown solutions. that's not true. in many cases, our needs are rather simple, and it is simpler and faster to DIY.

like image 28
irreputable Avatar answered Nov 07 '22 13:11

irreputable