Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is regex.IsMatch(str) faster than str.EndsWith (invariant culture)?

This is some micro-benchmarking for a code path that is traversed a gazillion times per nanosecond and needs to be fast.

For the code snippet below, comparing

  • x.EndsWith(y, InvariantCulture)
  • Regex(y, Compiled | CultureInvariant).IsMatch(x)

I get the following figures:

=============================
Regex   : 00:00:01.2235890. Ignore this: 16666666
EndsWith: 00:00:03.2194626. Ignore this: 16666666
=============================
Regex   : 00:00:01.0979105. Ignore this: 16666666
EndsWith: 00:00:03.2346031. Ignore this: 16666666
=============================
Regex   : 00:00:01.0687845. Ignore this: 16666666
EndsWith: 00:00:03.3199213. Ignore this: 16666666

In other words, EndsWith needs 3 times as much time as Regex.

I should note that I experimented with other values and depending on the string values used, sometimes EndsWith is faster, sometimes Regex is.

EndsWith(x, InvariantCulture) boils down to some argument checking and then extern int nativeCompareOrdinalEx(String, int, String, int, int), which I'd expect to be fast. (As @nhahtdh correctly pointed out, in the case of InvariantCulture it calls CultureInfo.InvariantCulture.CompareInfo.IsSuffix which calls InternalFindNLSStringEx. I had accidentally followed the Ordinal trail)

N.B.: I just found out that when calling EndsWith with Ordinal instead of InvariantCulture, EndsWith gets so much faster than Regex... Unfortunately there's no RegexOptions.Ordinal to compare it with.

I also expected the compiled regular expression to be fast, but how can it beat the specialized method?

Le code:

string[] BunchOfIDs =
{
    "zxc@x@432143214@O@abcße",
    "zxc@x@432143214@T@abcßX",
    "qwe@x@432143214@O@abcße",
    "qwe@x@432143214@XXabc",
    "zxc@x@1234@O@aXcße",
    "qwe@y@1234@O@aYcße",
};

var endsWith = "@abcße";
var endsWithRegex = new Regex("@abcße$", RegexOptions.None);

int reps = 20000000;
for (int i = 0; i < 3; i++)
{
    Console.WriteLine("=============================");
    int x = 0;
    var sw = Stopwatch.StartNew();
    for (int j = 0; j < reps; j++)
    {
        x += BunchOfIDs[j % BunchOfIDs.Length].EndsWith(endsWith, StringComparison.InvariantCulture) ? 1 : 2;
    }
    Console.WriteLine("EndsWith: " + sw.Elapsed + ". Ignore this: " + x);

    x = 0;
    sw = Stopwatch.StartNew();
    for (int j = 0; j < reps; j++)
    {
        x += endsWithRegex.IsMatch(BunchOfIDs[j % BunchOfIDs.Length]) ? 1 : 2;
    }
    Console.WriteLine("Regex   : " + sw.Elapsed + ". Ignore this: " + x);
}
like image 690
Evgeniy Berezovsky Avatar asked Jan 15 '15 06:01

Evgeniy Berezovsky


1 Answers

It might be

Because StringComparison.InvariantCulture != RegexOptions.CultureInvariant!

This snippet

var str = "ss";
var endsWith = "ß";
var endsWithRegex = new Regex("ß$",
    RegexOptions.Compiled | RegexOptions.CultureInvariant);
Console.WriteLine(str.EndsWith(endsWith, StringComparison.InvariantCulture)
    + " vs "
    + endsWithRegex.IsMatch(str));

prints

True vs False

So it looks like RegexOptions.CultureInvariant does not imply things implied by StringComparison.InvariantCulture. Is RegexOptions.CultureInvariant perhaps more like StringComparison.Ordinal?

like image 127
Evgeniy Berezovsky Avatar answered Nov 15 '22 22:11

Evgeniy Berezovsky