Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why Regex CompileToAssembly giving slower performance than compiled regex and Interpreted Regex?

Tags:

c#

.net

regex

I am using the following code to test CompileToAssembly performance against compiled regex but the results are not appropriate. Please let me know what am I missing. Thanks!!!

static readonly Regex regex = new Regex(@"(stats|pause\s?(all|\d+(\,\d+)*)|start\s?(all|\d+(\,\d+)*)|add\s?time\s?(all|\d+(\,\d+)*)(\s\d+)|c(?:hange)?\s?p(?:asskey)?|close)(.*)", RegexOptions.Compiled);
static readonly Regex reg = new Regex(@"(stats|pause\s?(all|\d+(\,\d+)*)|start\s?(all|\d+(\,\d+)*)|add\s?time\s?(all|\d+(\,\d+)*)(\s\d+)|c(?:hange)?\s?p(?:asskey)?|close)(.*)");
static readonly Regex level4 = new DuplicatedString();

    static void Main()
    {
        const string str = "add time 243,3453,43543,543,534534,54534543,345345,4354354235,345435,34543534 6873brekgnfkjerkgiengklewrij";
        const int itr = 1000000;
        CompileToAssembly();
        Match match;
        Stopwatch sw = new Stopwatch();
        sw.Start();
        for (int i = 0; i < itr; i++)
        {
             match = regex.Match(str);
        }
        sw.Stop();
        Console.WriteLine("RegexOptions.Compiled: {0}ms", sw.ElapsedMilliseconds);

        sw.Reset();
        sw.Start();
        for (int i = 0; i < itr; i++)
        {
            match = level4.Match(str);
        }
        sw.Stop();

        Console.WriteLine("CompiledToAssembly: {0}ms", sw.ElapsedMilliseconds);

        sw.Reset();
        sw.Start();
        for (int i = 0; i < itr; i++)
        {
            match = reg.Match(str);
        }
        sw.Stop();
        Console.WriteLine("Interpreted: {0}ms", sw.ElapsedMilliseconds);
        Console.ReadLine();
    }

    public static void CompileToAssembly()
    {
        RegexCompilationInfo expr;
        List<RegexCompilationInfo> compilationList = new List<RegexCompilationInfo>();

        // Define regular expression to detect duplicate words
        expr = new RegexCompilationInfo(@"(stats|pause\s?(all|\d+(\,\d+)*)|start\s?(all|\d+(\,\d+)*)|add\s?time\s?(all|\d+(\,\d+)*)(\s\d+)|c(?:hange)?\s?p(?:asskey)?|close)(.*)",
                   RegexOptions.Compiled,
                   "DuplicatedString",
                   "Utilities.RegularExpressions",
                   true);
        // Add info object to list of objects
        compilationList.Add(expr);

        // Apply AssemblyTitle attribute to the new assembly
        //
        // Define the parameter(s) of the AssemblyTitle attribute's constructor 
        Type[] parameters = { typeof(string) };
        // Define the assembly's title
        object[] paramValues = { "General-purpose library of compiled regular expressions" };
        // Get the ConstructorInfo object representing the attribute's constructor
        ConstructorInfo ctor = typeof(System.Reflection.AssemblyTitleAttribute).GetConstructor(parameters);
        // Create the CustomAttributeBuilder object array
        CustomAttributeBuilder[] attBuilder = { new CustomAttributeBuilder(ctor, paramValues) };

        // Generate assembly with compiled regular expressions
        RegexCompilationInfo[] compilationArray = new RegexCompilationInfo[compilationList.Count];
        AssemblyName assemName = new AssemblyName("RegexLib, Version=1.0.0.1001, Culture=neutral, PublicKeyToken=null");
        compilationList.CopyTo(compilationArray);
        Regex.CompileToAssembly(compilationArray, assemName, attBuilder);
    }

following are the results:

RegexOptions.Compiled: 3908ms
CompiledToAssembly: 59349ms
Interpreted: 5653ms
like image 715
iRock Avatar asked Apr 09 '12 20:04

iRock


2 Answers

Your code has a problem: static field initializers will run before static methods run. That means that level4 has already been assigned before Main() runs. This means that the object referred to by level4 is not an instance of the class created in CompileToAssembly().

Note that the example code for Regex.CompileToAssembly shows the compilation of the regex and its consumption in two different programs. The actual regex you're timing as "CompiledToAssembly" could therefore be a different regex that you compiled in an earlier test.

Another factor to consider: the overhead of loading an assembly into memory and jitting it to machine code might be significant enough that you need more than 1,000,000 iterations to see a benefit.

like image 112
phoog Avatar answered Oct 15 '22 02:10

phoog


You are running under a debugger (Visual Studio). It will prevent JIT optimizations from happening when an assembly is loaded. Try running without debugger (ctrl-f5).

like image 41
usr Avatar answered Oct 15 '22 02:10

usr