Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular Expressions in C# running slowly

I have been doing a little work with regex over the past week and managed to make a lot of progress, however, I'm still fairly n00b. I have a regex written in C#:

string isMethodRegex = 
    @"\b(public|private|internal|protected)?\s*(static|virtual|abstract)?"+
    @"\s*(?<returnType>[a-zA-Z\<\>_1-9]*)\s(?<method>[a-zA-Z\<\>_1-9]+)\s*\"+
    @"((?<parameters>(([a-zA-Z\[\]\<\>_1-9]*\s*[a-zA-Z_1-9]*\s*)[,]?\s*)+)\)";
IsMethodRegex = new Regex(isMethodRegex);

For some reason, when calling the regular expression IsMethodRegex.IsMatch() it hangs for 30+ seconds on the following string:

"\t * Returns collection of active STOP transactions (transaction type 30) "

Does anyone how the internals of Regex works and why this would be so slow on matching this string and not others. I have had a play with it and found that if I take out the * and the parenthesis then it runs fine. Perhaps the regular expression is poorly written?

Any help would be so greatly appreciated.

like image 531
Joshy Avatar asked Oct 04 '11 00:10

Joshy


1 Answers

EDIT: I think the performance issue comes in the way <parameters> matching group is done. I have rearranged to match a first parameter, then any number of successive parameters, or optionally none at all. Also I have changed the \s* between parameter type and name to \s+ (I think this was responsible for a LOT of backtracking because it allows no spaces, so that object could match as obj and ect with \s* matching no spaces) and it seems to run a lot faster:

string isMethodRegex = 
    @"\b(public|private|internal|protected)?\s*(static|virtual|abstract)?"+
    @"\s*(?<returnType>[a-zA-Z\<\>_1-9]*)\s*(?<method>[a-zA-Z\<\>_1-9]+)\s*\"+
    @"((?<parameters>((\s*[a-zA-Z\[\]\<\>_1-9]*\s+[a-zA-Z_1-9]*\s*)"+
    @"(\s*,\s*[a-zA-Z\[\]\<\>_1-9]*\s+[a-zA-Z_1-9]*\s*)*\s*))?\)";

EDIT: As duly pointed out by @Dan, the following is simply because the Regex can exit early.

This is indeed a really bizarre situation, but if I remove the two optional matching at the beginning (for public/private/internal/protected and static/virtual/abstract) then it starts to run almost instantaneously again:

string isMethodRegex = 
    @"\b(public|private|internal|protected)\s*(static|virtual|abstract)"+
    @"(?<returnType>[a-zA-Z\<\>_1-9]*)\s(?<method>[a-zA-Z\<\>_1-9]+)\s*\"+
    @"((?<parameters>(([a-zA-Z\[\]\<\>_1-9]*\s*[a-zA-Z_1-9]*\s*)[,]?\s*)+)\)";
var IsMethodRegex = new Regex(isMethodRegex);

string s = "\t * Returns collection of active STOP transactions (transaction type 30) ";

Console.WriteLine(IsMethodRegex.IsMatch(s));

Technically you could split into four separate Regex's for each possibility to deal with this particular situation. However, as you attempt to deal with more and more complicated scenarios, you will likely run into this performance issue again and again, so this is probably not the ideal approach.

like image 132
mellamokb Avatar answered Oct 22 '22 01:10

mellamokb