Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Compiled regex performance not as expected?

Tags:

c#

.net

regex

I'm looking over Joe Albahari's C# 5.0 in A Nutshell and in Chapter 26 regarding regular expressions, he states:

In some of the proceeding examples, we called a static RegEx method repeatedly with the same pattern. An alternative approach in these cases is to instantiate a Regex object with the pattern and then call instance methods . . .

// Code example from the book
Regex r = new Regex (@"sausages?");
Console.WriteLine (r.Match ("sausage"));   // sausage
Console.WriteLine (r.Match ("sausages"));  // sausages

This is not just a syntactic convenience: under the covers . . . This results in (up to 10 times) faster matching, at the expense of a small initial compilation cost (a few tens of microseconds).

So the curious me wrote a benchmark. This program splits a string, iterating ~ 32 million calls of both the static calls and instance calls of Regex, as well as an alternative way to perform the same task.

class Program {
  static void Main(string[] args) {
    var str = "01/02/03/04/05/06/07/08/09/10";
    var regex = new Regex("/");
    var results = new List<Tuple<string, long>>();

    for (int j = 0; j < 128; j++) {
      var s = Stopwatch.StartNew();
      for (var i = 0; i < 1024 * 1024; i++) {
        RegexSplit(str);
      }
      s.Stop();
      results.Add(new Tuple<string, long>("Regex", s.ElapsedTicks));

      s = Stopwatch.StartNew();
      for (var i = 0; i < 1024 * 1024; i++) {
        CompiledRegexSplit(str, regex);
      }
      s.Stop();
      results.Add(new Tuple<string, long>("Compiled", s.ElapsedTicks));

      s = Stopwatch.StartNew();
      for (var i = 0; i < 1024 * 1024; i++) {
        StringSplit(str);
      }
      s.Stop();
      results.Add(new Tuple<string, long>("String", s.ElapsedTicks));

      Console.Write(".");
    }

    var resultsGroup = from it in results
      group it by it.Item1
      into g
      select new {
        Type = g.Key,
        Avg = g.Average(git => git.Item2)
      };

    resultsGroup.ToList().ForEach(it => Console.WriteLine("{0}: {1:000000000.00}", it.Type, it.Avg));
  }

  static void StringSplit(string str) {
    var split = str.Split('/');
  }

  static void CompiledRegexSplit(string str, Regex regex) {
    var split = regex.Split(str);
  }

  static void RegexSplit(string str) {
    var split = Regex.Split(str, "/");
  }
}

and got the following results:

Regex:    12257601.40
Compiled: 10869996.92
String:   01328636.27

That's not quite what I expected based on the book, and I doubt that instantiating one Regex takes 12 million ticks.

This run was in .NET 4.5, x64 release mode.

What is the explanation of the unexpected result?

like image 431
jdphenix Avatar asked Nov 01 '22 03:11

jdphenix


1 Answers

Your code has only produced an instance of a Regex object. To use an actual compiled Regex object, you must specify the RegexOptions.Compiled option. This will inform the Regex object that it will be used in such a way that it is worth the up front cost to compile itself so that it can be executed more quickly.

The reason it is not done automatically is that for a limited number of runs, the process of compiling the regular expression will actually take longer than the time saved. The Regex object exists to hold a regular expression with meta data such as Regex engine options and such, so it may be used with or without compilation.

The code to do the compilation would be:

var regex = new Regex("/", RegexOptions.Compiled);
like image 105
AJ Henderson Avatar answered Nov 13 '22 07:11

AJ Henderson