Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does the C# RegexOptions.Compiled makes the match slower?

I have the following code:

static void Main(string[] args)
{
    const string RegXPattern = @"/api/(?<controller>\w+)/(?<action>\w+)/?$";
    var regex = new Regex(RegXPattern, RegexOptions.IgnoreCase | RegexOptions.Compiled);

    const string InputToMatch = "/api/person/load";

    regex.IsMatch(InputToMatch); // Warmup

    var sw = Stopwatch.StartNew();
    for (int i = 0; i < 10000000; i++)
    {
        var match = regex.IsMatch(InputToMatch);
    }
    sw.Stop();

    Console.WriteLine(sw.Elapsed.ToString());
    Console.ReadLine();
}

Running the above on my machine under Releae, finishes in around 18 seconds and removing the RegexOptions.Compiled makes it run in 13 seconds.

My understanding was that including this flag would make the match faster but in my example it is resulting in ~30% lower performance.

What am I missing here?

like image 768
MaYaN Avatar asked Dec 03 '16 23:12

MaYaN


2 Answers

The problem is that the compiled Regex version does a char by char comparison with the current culture of the form

if .... char.ToLower(runtext[index2], CultureInfo.CurrentCulture) == 'c' ....

where for each character the thread static CultureInfo.CurrentCulture is retrieved.

This shows up in the profiler as CPU consumer:

enter image description here

I have filed an issue for .NET Core and fixed it with a PR. If you need that merged back to the regular .NET Framework you should file an issue at github to request a backport. The issue shows up for all compiled Regex which have set

  • RegexOptions.IgnoreCase | RegexOptions.Compiled
  • RegexOptions.CultureInvariant | RegexOptions.Compiled
  • RegexOptions.CultureInvariant | RegexOptions.IgnoreCase RegexOptions.Compiled

The seemingly strange option RegexOptions.CultureInvariant | RegexOptions.Compiled is in fact necessary if you create a regular expression on a thread with a specific locale which has special casing or number separators. The Regex match expression will be specifically created according to your current locale. If you want a locale independant Regex then you need to use RegexOptions.CultureInvariant.

like image 61
Alois Kraus Avatar answered Sep 18 '22 00:09

Alois Kraus


I think it is the RegexOptions.IgnoreCase that is causing the slow down here. These are my timings for comparison:

Compiled     11s
Not compiled 10s

Using the inline modifier (?i) in the regex gives these results:

Compiled     10s
Not compiled 9s

Not including the case comparison in the regex (by using /API/(?<controller>\w+)/(?<action>\w+)/?$ as the pattern, and .ToUpper() on the input so that the same number of matches are done):

Compiled     6s
Not compiled 8s

Taking this one step further (as suggested by Antonín) and using the case-insensitive pattern /[aA][pP][iI]/(?<controller>\w+)/(?<action>\w+)/?$ gives:

Compiled     5s
Not compiled 8s

From this, the fastest of them all is using RegexOptions.Compiled, but dealing with the casing of the /api/ prefix using pattern matching in the regex.

To verify these results, I've also ran them using a set of randomised (but still matching) inputs. Here are the results:

IgnoreCase | Compiled                13s
IgnoreCase                           11s
(?i) plus Compiled                   13s
(?i)                                 11s
Compiled plus external case handling 9s
External case handling               12s
Case handling in regex plus Compiled 8s
Case handling in regex               11s

As to why this is slower, this blog post discusses a possible reason.

like image 42
adrianbanks Avatar answered Sep 19 '22 00:09

adrianbanks