Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why Compiled RegEx performance is slower than Intrepreted RegEx?

Tags:

I run into this article:

Performance: Compiled vs. Interpreted Regular Expressions, I modified the sample code to compile 1000 Regex and then run each 500 times to take advantage of precompilation, however even in that case interpreted RegExes run 4 times faster!

This means RegexOptions.Compiled option is completely useless, actually even worse, it's slower! Big difference was due to JIT, after solving JIT compiled regex in the the following code still performs a little bit slow and doesn't make sense to me but @Jim in the answers provided a much cleaner version which works as expected.

Can anyone explain why this is the case?

Code, taken & modified from the blog post:

using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Text.RegularExpressions;  namespace RegExTester {     class Program     {         static void Main(string[] args)         {             DateTime startTime = DateTime.Now;              for (int i = 0; i < 1000; i++)             {                 CheckForMatches("some random text with email address, [email protected]" + i.ToString());                 }               double msTaken = DateTime.Now.Subtract(startTime).TotalMilliseconds;             Console.WriteLine("Full Run: " + msTaken);               startTime = DateTime.Now;              for (int i = 0; i < 1000; i++)             {                 CheckForMatches("some random text with email address, [email protected]" + i.ToString());             }               msTaken = DateTime.Now.Subtract(startTime).TotalMilliseconds;             Console.WriteLine("Full Run: " + msTaken);              Console.ReadLine();          }           private static List<Regex> _expressions;         private static object _SyncRoot = new object();          private static List<Regex> GetExpressions()         {             if (_expressions != null)                 return _expressions;              lock (_SyncRoot)             {                 if (_expressions == null)                 {                     DateTime startTime = DateTime.Now;                      List<Regex> tempExpressions = new List<Regex>();                     string regExPattern =                         @"^[a-zA-Z0-9]+[a-zA-Z0-9._%-]*@{0}$";                      for (int i = 0; i < 2000; i++)                     {                         tempExpressions.Add(new Regex(                             string.Format(regExPattern,                             Regex.Escape("domain" + i.ToString() + "." +                             (i % 3 == 0 ? ".com" : ".net"))),                             RegexOptions.IgnoreCase));//  | RegexOptions.Compiled                     }                      _expressions = new List<Regex>(tempExpressions);                     DateTime endTime = DateTime.Now;                     double msTaken = endTime.Subtract(startTime).TotalMilliseconds;                     Console.WriteLine("Init:" + msTaken);                 }             }              return _expressions;         }          static  List<Regex> expressions = GetExpressions();          private static void CheckForMatches(string text)         {              DateTime startTime = DateTime.Now;                   foreach (Regex e in expressions)                 {                     bool isMatch = e.IsMatch(text);                 }               DateTime endTime = DateTime.Now;             //double msTaken = endTime.Subtract(startTime).TotalMilliseconds;             //Console.WriteLine("Run: " + msTaken);          }     } } 
like image 586
dr. evil Avatar asked May 14 '11 21:05

dr. evil


People also ask

Is compiled regex faster?

I created a much simpler test that will show you that compiled regular expressions are unquestionably faster than not compiled. Here, the compiled regular expression is 35% faster than the not compiled regular expression.

Why is my regex slow?

The reason the regex is so slow is that the "*" quantifier is greedy by default, and so the first ". *" tries to match the whole string, and after that begins to backtrack character by character. The runtime is exponential in the count of numbers on a line.

Does regex affect performance?

Being more specific with your regular expressions, even if they become much longer, can make a world of difference in performance. The fewer characters you scan to determine the match, the faster your regexes will be.

Is regex faster than string replace?

String operations will always be faster than regular expression operations.


2 Answers

Compiled regular expressions match faster when used as intended. As others have pointed out, the idea is to compile them once and use them many times. The construction and initialization time are amortized out over those many runs.

I created a much simpler test that will show you that compiled regular expressions are unquestionably faster than not compiled.

    const int NumIterations = 1000;     const string TestString = "some random text with email address, [email protected]";     const string Pattern = "^[a-zA-Z0-9]+[a-zA-Z0-9._%-]*@domain0\\.\\.com$";     private static Regex NormalRegex = new Regex(Pattern, RegexOptions.IgnoreCase);     private static Regex CompiledRegex = new Regex(Pattern, RegexOptions.IgnoreCase | RegexOptions.Compiled);     private static Regex DummyRegex = new Regex("^.$");      static void Main(string[] args)     {         var DoTest = new Action<string, Regex, int>((s, r, count) =>             {                 Console.Write("Testing {0} ... ", s);                 Stopwatch sw = Stopwatch.StartNew();                 for (int i = 0; i < count; ++i)                 {                     bool isMatch = r.IsMatch(TestString + i.ToString());                 }                 sw.Stop();                 Console.WriteLine("{0:N0} ms", sw.ElapsedMilliseconds);             });          // Make sure that DoTest is JITed         DoTest("Dummy", DummyRegex, 1);         DoTest("Normal first time", NormalRegex, 1);         DoTest("Normal Regex", NormalRegex, NumIterations);         DoTest("Compiled first time", CompiledRegex, 1);         DoTest("Compiled", CompiledRegex, NumIterations);          Console.WriteLine();         Console.Write("Done. Press Enter:");         Console.ReadLine();     } 

Setting NumIterations to 500 gives me this:

Testing Dummy ... 0 ms Testing Normal first time ... 0 ms Testing Normal Regex ... 1 ms Testing Compiled first time ... 13 ms Testing Compiled ... 1 ms 

With 5 million iterations, I get:

Testing Dummy ... 0 ms Testing Normal first time ... 0 ms Testing Normal Regex ... 17,232 ms Testing Compiled first time ... 17 ms Testing Compiled ... 15,299 ms 

Here you see that the compiled regular expression is at least 10% faster than the not compiled version.

It's interesting to note that if you remove the RegexOptions.IgnoreCase from your regular expression, the results from 5 million iterations are even more striking:

Testing Dummy ... 0 ms Testing Normal first time ... 0 ms Testing Normal Regex ... 12,869 ms Testing Compiled first time ... 14 ms Testing Compiled ... 8,332 ms 

Here, the compiled regular expression is 35% faster than the not compiled regular expression.

In my opinion, the blog post you reference is simply a flawed test.

like image 145
Jim Mischel Avatar answered Sep 18 '22 12:09

Jim Mischel


http://www.codinghorror.com/blog/2005/03/to-compile-or-not-to-compile.html

Compiled helps only if you instantiate it once and re-use it multiple times. If you're creating a compiled regex in the for loop then it obviously will perform worse. Can you show us your sample code?

like image 36
Muhammad Hasan Khan Avatar answered Sep 19 '22 12:09

Muhammad Hasan Khan