The String.Contains method looks like this internally
public bool Contains(string value)
{
return this.IndexOf(value, StringComparison.Ordinal) >= 0;
}
The IndexOf
overload that is called looks like this
public int IndexOf(string value, StringComparison comparisonType)
{
return this.IndexOf(value, 0, this.Length, comparisonType);
}
Here another call is made to the final overload, which then calls the relevant CompareInfo.IndexOf
method, with the signature
public int IndexOf(string value, int startIndex, int count, StringComparison comparisonType)
Therefore, calling the final overload would be the fastest (albeit may be considered a micro optimization in most cases).
I may be missing something obvious but why does the Contains
method not call the final overload directly considering that no other work is done in the intermediate call and that the same information is available at both stages?
Is the only advantage that if the signature of the final overload changes, only one change needs to be made (that of the intermediate method), or is there more to the design than that?
Edit from the comments (see update 2 for speed difference explanation)
To clarify the performance differences I'm getting in case I've made a mistake somewhere:
I ran this benchmark (looped 5 times to avoid jitter bias) and used this extension method to compare against the String.Contains
method
public static bool QuickContains(this string input, string value)
{
return input.IndexOf(value, 0, input.Length, StringComparison.OrdinalIgnoreCase) >= 0;
}
with the loop looking like this
for (int i = 0; i < 1000000; i++)
{
bool containsStringRegEx = testString.QuickContains("STRING");
}
sw.Stop();
Console.WriteLine("QuickContains: " + sw.ElapsedMilliseconds);
In the benchmark test, QuickContains
seems about 50% faster than String.Contains
on my machine.
Update 2 (performance difference explained)
I've spotted something unfair in the benchmark which explains a lot. The benchmark itself was to measure case-insensitive strings but since String.Contains
can only perform case-sensitive searches, the ToUpper
method was included. This would skew the results, not in terms of final output, but at least in terms of simply measuring String.Contains
performance in non case-sensitive searches.
So now, if I use this extension method
public static bool QuickContains(this string input, string value)
{
return input.IndexOf(value, 0, input.Length, StringComparison.Ordinal) >= 0;
}
use StringComparison.Ordinal
in the 2 overload IndexOf
call and remove ToUpper
, the QuickContains
method actually becomes the slowest. IndexOf
and Contains
are pretty much on par in terms of performance. So clearly it was the ToUpper
call skewing the results of why there was such a discrepancy between Contains
and IndexOf
.
Not sure why the QuickContains
extension method has become the slowest. (Possibly related to the fact that Contains
has the [__DynamicallyInvokable, TargetedPatchingOptOut("Performance critical to inline across NGen image boundaries")]
attribute?).
Question still remains as to why the 4 overload method isn't called directly but it seems performance isn't impacted (as Adrian and delnan pointed out in the comments) by the decision.
Been a while (years) since I've looked at assembly, and I know close to nothing about MSIL and JIT, so it would be a nice exercise - couldn't resist, so here's just a bit of, possibly redundant, empirical data. Does the IndexOf
overload get inlined?
Here's a tiny Console app:
class Program
{
static void Main(string[] args)
{
"hello".Contains("hell");
}
}
The JIT generates this in an optimized Release build, Any CPU, running in 32 bit. I've shortened the addresses, and removed some irrelevant lines:
--- ...\Program.cs
"hello".Contains("hell");
[snip]
17 mov ecx,dword ptr ds:[0320219Ch] ; pointer to "hello"
1d mov edx,dword ptr ds:[032021A0h] ; pointer to "hell"
23 cmp dword ptr [ecx],ecx
25 call 680A6A6C ; String.Contains()
[snip]
The call
at 0x00000025 goes here:
String.Contains
00 push 0 ; startIndex = 0
02 push dword ptr [ecx+4] ; count = this.Length (second DWORD of String)
05 push 4 ; comparisonType = StringComparison.Ordinal
07 call FF9655A4 ; String.IndexOf()
0c test eax,eax
0e setge al ; if (... >= 0)
11 movzx eax,al
14 ret
Sure enough, it seems to call, directly, the final String.IndexOf
overload with four arguments: three push
ed; one in edx
(value
: "hell"); this
("hello") in ecx
. To confirm, this is where the call
at 0x00000005 goes:
00 push ebp
01 mov ebp,esp
03 push edi
04 push esi
05 push ebx
06 mov esi,ecx ; this ("hello")
08 mov edi,edx ; value ("hell")
0a mov ebx,dword ptr [ebp+10h]
0d test edi,edi ; if (value == null)
0f je 00A374D0
15 test ebx,ebx ; if (startIndex < 0)
17 jl 00A374FB
1d cmp dword ptr [esi+4],ebx ; if (startIndex > this.Length)
20 jl 00A374FB
26 cmp dword ptr [ebp+0Ch],0 ; if (count < 0)
2a jl 00A3753F
[snip]
... which would be the body of:
public int IndexOf(string value,
int startIndex,
int count,
StringComparison comparisonType)
{
if (value == null)
throw new ArgumentNullException("value");
if (startIndex < 0 || startIndex > this.Length)
throw new ArgumentOutOfRangeException("startIndex",
Environment.GetResourceString("ArgumentOutOfRange_Index"));
if (count < 0 || startIndex > this.Length - count)
throw new ArgumentOutOfRangeException("count",
Environment.GetResourceString("ArgumentOutOfRange_Count"));
...
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With