Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex slow on Windows Server 2008

Tags:

c#

.net

regex

I have a situation where my regular expressions compile extremely slowly on Windows Server 2008. I wrote a small console application to highlight this issue. The app generates its own input and builds up a Regex from words in an XML file. I built a release version of this app and ran it both on my personal laptop (running XP) and the Windows 2008 server. The regular expression took 0.21 seconds to compile on my laptop, but 23 seconds to compile on the server.

Any ideas what could be causing this? The problem is only on first use of the Regex (when it is first compiled - thereafter it is fine)

I have also found another problem - when using \s+ in the regular expression on the same Windows 2008 server, the memory balloons (uses 4GB+) and the compilation of the Regex never finishes.

Is there a known issue with Regex and 64 bit .net? Is there a fix/patch available for this? I cannot really find any info on the net, but I have found a few articles about this same issues in Framework 2.0 - surely this has been fixed by now?

More info: The server is running the 64 bit version of the .net framework (3.5 SP1) and on my laptop I have Visual Studio 2008 and the 3.5 framework installed. The regular expression is of the following pattern: ^word$|^word$|^word$ and is constructed with the following flags: RegexOptions.IgnoreCase | RegexOptions.Compiled


Here is a code snippet:

StringBuilder regexString = new StringBuilder();
if (!String.IsNullOrEmpty(fileLocation))
{
    XmlTextReader textReader = new XmlTextReader(fileLocation);
    textReader.Read();
    while (textReader.Read())
    {
        textReader.MoveToElement();
        if (textReader.Name == "word")
        {
            regexString.Append("^" + textReader.GetAttribute(0) + "$|");
        }
    }
    ProfanityFilter = new Regex(regexString.ToString(0, regexString.Length - 1), RegexOptions.IgnoreCase | RegexOptions.Compiled);
}

DateTime time = DateTime.Now;
Console.WriteLine("\nIsProfane:\n" + ProfanityFilter.IsMatch("test"));
Console.WriteLine("\nTime: " + (DateTime.Now - time).TotalSeconds);
Console.ReadKey();

This results in a time of 0.21 seconds on my laptop and 23 seconds on the 2008 server. The XML file consists of 168 words in the following format:

<word text="test" />
like image 491
pjmyburg Avatar asked Sep 29 '09 08:09

pjmyburg


2 Answers

I found a solution, given not the correct one, but perfect in my case. For some reason if I leave out the RegexOptions.Compiled flag, the Regex is much, much faster. I even managed to execute the Regex on 100 long phrases in under 65 milliseconds on the 2008 server.

This must be a bug in the .net lib as the uncompiled version is supposed to be much slower than the compiled version. Either way, under 1 millisecond per check is very much acceptable for me :)

like image 74
pjmyburg Avatar answered Sep 21 '22 04:09

pjmyburg


You can pre-compile your regexes using the Regex.CompileToAssembly method, and then you could deploy the compiled regexes to your server.

like image 31
Polyfun Avatar answered Sep 23 '22 04:09

Polyfun