Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

RegEx.IsMatch() vs. String.ToUpper().Contains() performance

Tags:

string

c#

regex

Since there is no case insensitive string.Contains() (yet a case insensitive version of string.Equals() exists which baffles me, but I digress) in .NET, What is the performance differences between using RegEx.IsMatch() vs. using String.ToUpper().Contains()?

Example:

string testString = "tHiSISaSTRINGwiThInconSISteNTcaPITaLIZATion";

bool containsString = RegEx.IsMatch(testString, "string", RegexOptions.IgnoreCase);
bool containsStringRegEx = testString.ToUpper().Contains("STRING");

I've always heard that string.ToUpper() is a very expensive call so I shy away from using it when I want to do string.Contains() comparisons, but how does RegEx.IsMatch() compare in terms of performance?

Is there a more efficient approach for doing such comparisons?

like image 291
Saggio Avatar asked Jul 10 '13 19:07

Saggio


2 Answers

Here's a benchmark

using System;
using System.Diagnostics;
using System.Text.RegularExpressions;

public class Program
{
    public static void Main(string[] args)
    {
        Stopwatch sw = new Stopwatch();

        string testString = "tHiSISaSTRINGwiThInconSISteNTcaPITaLIZATion";

        sw.Start();
        var re = new Regex("string", RegexOptions.Compiled | RegexOptions.CultureInvariant | RegexOptions.IgnoreCase);
        for (int i = 0; i < 1000000; i++)
        {
            bool containsString = re.IsMatch(testString);
        }
        sw.Stop();
        Console.WriteLine("RX: " + sw.ElapsedMilliseconds);

        sw.Restart();
        for (int i = 0; i < 1000000; i++)
        {
            bool containsStringRegEx = testString.ToUpper().Contains("STRING");
        }


        sw.Stop();
        Console.WriteLine("Contains: " + sw.ElapsedMilliseconds);

        sw.Restart();
        for (int i = 0; i < 1000000; i++)
        {
            bool containsStringRegEx = testString.IndexOf("STRING", StringComparison.OrdinalIgnoreCase) >= 0 ;
        }


        sw.Stop();
        Console.WriteLine("IndexOf: " + sw.ElapsedMilliseconds);
    }
}

Results were

IndexOf (183ms) > Contains (400ms) > Regex (477ms)

(Updated output times using the compiled Regex)

like image 50
keyboardP Avatar answered Oct 03 '22 23:10

keyboardP


There is another version using String.IndexOf(String,StringComparison) that might be more efficient than either of the two you suggested:

string testString = "tHiSISaSTRINGwiThInconSISteNTcaPITaLIZATion";
bool contained = testString.IndexOf("string", StringComparison.OrdinalIgnoreCase) >= 0;

If you need a culture-sensitive comparison, use CurrentCultureIgnoreCase instead of OrdinalIgnoreCase.

like image 42
Douglas Avatar answered Oct 04 '22 01:10

Douglas