I have the following code:
static void Main(string[] args)
{
const string RegXPattern = @"/api/(?<controller>\w+)/(?<action>\w+)/?$";
var regex = new Regex(RegXPattern, RegexOptions.IgnoreCase | RegexOptions.Compiled);
const string InputToMatch = "/api/person/load";
regex.IsMatch(InputToMatch); // Warmup
var sw = Stopwatch.StartNew();
for (int i = 0; i < 10000000; i++)
{
var match = regex.IsMatch(InputToMatch);
}
sw.Stop();
Console.WriteLine(sw.Elapsed.ToString());
Console.ReadLine();
}
Running the above on my machine under Releae, finishes in around 18 seconds and removing the RegexOptions.Compiled
makes it run in 13 seconds.
My understanding was that including this flag would make the match faster but in my example it is resulting in ~30% lower performance.
What am I missing here?
The problem is that the compiled Regex version does a char by char comparison with the current culture of the form
if .... char.ToLower(runtext[index2], CultureInfo.CurrentCulture) == 'c' ....
where for each character the thread static CultureInfo.CurrentCulture is retrieved.
This shows up in the profiler as CPU consumer:
I have filed an issue for .NET Core and fixed it with a PR. If you need that merged back to the regular .NET Framework you should file an issue at github to request a backport. The issue shows up for all compiled Regex which have set
The seemingly strange option RegexOptions.CultureInvariant | RegexOptions.Compiled is in fact necessary if you create a regular expression on a thread with a specific locale which has special casing or number separators. The Regex match expression will be specifically created according to your current locale. If you want a locale independant Regex then you need to use RegexOptions.CultureInvariant.
I think it is the RegexOptions.IgnoreCase
that is causing the slow down here. These are my timings for comparison:
Compiled 11s Not compiled 10s
Using the inline modifier (?i)
in the regex gives these results:
Compiled 10s Not compiled 9s
Not including the case comparison in the regex (by using /API/(?<controller>\w+)/(?<action>\w+)/?$
as the pattern, and .ToUpper()
on the input so that the same number of matches are done):
Compiled 6s Not compiled 8s
Taking this one step further (as suggested by Antonín) and using the case-insensitive pattern /[aA][pP][iI]/(?<controller>\w+)/(?<action>\w+)/?$
gives:
Compiled 5s Not compiled 8s
From this, the fastest of them all is using RegexOptions.Compiled
, but dealing with the casing of the /api/
prefix using pattern matching in the regex.
To verify these results, I've also ran them using a set of randomised (but still matching) inputs. Here are the results:
IgnoreCase | Compiled 13s IgnoreCase 11s (?i) plus Compiled 13s (?i) 11s Compiled plus external case handling 9s External case handling 12s Case handling in regex plus Compiled 8s Case handling in regex 11s
As to why this is slower, this blog post discusses a possible reason.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With