Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does String.Contains work? [duplicate]

Possible Duplicate:
What algorithm .Net use for searching a pattern in a string?

I have a loop in my program that gets a line from a file. Then there is a check to whether the line contains a string

if(line.Contains("String"))
{
    //Do other stuff
}

There are over 2 million rows in the file so if I can quicken the speed by even 1/10th millisecond then this would save me over 3 minutes on each run.

So... Say a line is 1000 chars long, is it quicker to look for a short or long string, or does it not make a difference?

line.Contains("ABCDEFGHIJKLMNOPQRSTUVWXYZ");

or

line.Contains("ABCDEFG")

Thank you in advance.

like image 647
Ash Burlaczenko Avatar asked Sep 22 '10 13:09

Ash Burlaczenko


1 Answers

String.Contains() follows a torturous route through System.Globalization.CompareInfo into the CLR and the NLS support sub-system where I thoroughly got lost. This contains highly optimized code with impressive perf. The only way to do it faster is through pinvoking the standard CRT function wcsstr, available in msvcrt.dll

    [DllImport("msvcrt.dll", CharSet = CharSet.Unicode)]
    private static extern IntPtr wcsstr(string toSearch, string toFind);

It returns IntPtr.Zero if the string wasn't found. I made some measurements, using String.IndexOf() instead of Contains() to test the various string comparison options. All times are in nanoseconds, searching for a string of 7 characters in a string of 60 characters. With the string not present so worst case is measured. The lowest time out of a sample of 20 was used:

StringComparison.Ordinal (same as Contains) : 245 nanoseconds
StringComparison.OrdinalIgnoreCase : 327
StringComparison.InvariantCulture : 251
StringComparison.InvariantCultureIgnoreCase : 327
StringComparison.CurrentCulture : 275
StringComparison.CurrentCultureIgnoreCase : 340
wcsstr : 213

Very impressive numbers and on par with what you'd expect these functions to require. The wcsstr() function does the same kind of ordinal comparison as String.Compare(). It is only 13% faster, a statistically insignificant improvement given that real perf is not likely to approach these measurements due to the effects of CPU cache locality. I can only conclude that you're going about as fast as you can expect. Whether the slight improvement from wcsstr is worth it is up to you.

like image 157
Hans Passant Avatar answered Oct 10 '22 22:10

Hans Passant