I have a large and complex C# regex that runs OK when interpreted, but is a bit slow. I'm trying to speed this up by setting RegexOptions.Compiled
, and this seems to take about 30 seconds for the first time and instantly after that. I'm trying to negate this by compiling the regex to an assembly first, so my app can be as fast as possible.
My problem is when the compiling delay takes place, whether it's compiled in the app:
Regex myComplexRegex = new Regex(regexText, RegexOptions.Compiled);
MatchCollection matches = myComplexRegex.Matches(searchText);
foreach (Match match in matches) // <--- when the one-time long delay kicks in
{
}
or using Regex.CompileToAssembly in advance:
MatchCollection matches = new CompiledAssembly.ComplexRegex().Matches(searchText);
foreach (Match match in matches) // <--- when the one-time long delay kicks in
{
}
This is making compiling to an assembly basically useless, as I still get the delay on the first foreach
call. What I want is for all the compiling delay to be done at compile time instead (at the Regex.CompileToAssembly call), and not at runtime. Where am I going wrong ?
(The code I'm using to compile to an assembly is similar to http://www.dijksterhuis.org/regular-expressions-advanced/ , if that's relevant ).
Edit:
Should I be using new
when calling the compiled assembly in new CompiledAssembly.ComplexRegex().Matches(searchText);
? It gives a "object reference required" error without it though.
Update 2
Thanks for the answers/comments. The regex that I'm using is pretty long but basically straightforward, a list of thousands of words each separated by |. I can't see it'd be a backtracking problem really. The subject string can be just one letter long, and it can still cause the compilation delay. For a RegexOptions.Compiled regex, it'll take over 10 seconds to execute when the regex contains 5000 words. For comparison, the non-compiled version of the regex can take 30,000+ words and still execute just about instantly.
After doing a lot of testing on this, what I think I've found out is:
Please correct me if I'm wrong or missing something!
Regex has an interpreted mode and a compiled mode. The compiled mode takes longer to start, but is generally faster.
The reason the regex is so slow is that the "*" quantifier is greedy by default, and so the first ". *" tries to match the whole string, and after that begins to backtrack character by character. The runtime is exponential in the count of numbers on a line.
My experience shows that most of the time developers focus on correctness of a regex, leaving aside its performance. Yet matching a string with a regex can be surprisingly slow. So slow it can even stop any JS app or take 100% of a server CPU time causing denial of service (DOS).
Being more specific with your regular expressions, even if they become much longer, can make a world of difference in performance. The fewer characters you scan to determine the match, the faster your regexes will be.
When using RegexOptions.Compiled
, you should make sure to re-use the Regex object. It doesn't seem like you are doing this.
RegexOptions.Compiled
is a trade-off. The initial construction of the Regex will be slower, because code is compiled on-the-fly, but each match should be faster. If your regular expression changes at run-time, there will probably be no benefit from using RegexOptions.Compiled, although it might depend on the actual expression involved.
If your actual code looks like the one you have posted, you are not taking any advantage of CompileToAssembly
, as you are creating new, on-the-fly compiled instances of Regex each time that piece of code runs. In order to take advantage of CompileToAssembly, you will need to compile the Regex first; then take the generated assembly and reference it in your project. You should then instantiate the generated, strongly-typed Regex types generated.
In the example you link to, he has a regular expression named FindTCPIP, which gets compiled into a type named FindCTPIP. When this needs to be used, one should create a new instance of this specific type, such as:
TheRegularExpressions.FindTCPIP MatchTCP = new TheRegularExpressions.FindTCPIP();
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With