I've noticed a performance hit of iterating over a primitive collection (T[]) that has been cast to a generic interface collection (IList or IEnumberable).
For example:
private static int Sum(int[] array)
{
int sum = 0;
foreach (int i in array)
sum += i;
return sum;
}
The above code executes significantly faster than the code below, where the parameter is changed to type IList (or IEnumerable):
private static int Sum(IList<int> array)
{
int sum = 0;
foreach (int i in array)
sum += i;
return sum;
}
The performance hit still occurs if the object passed is a primitive array, and if I try changing the loop to a for loop instead of a foreach loop.
I can get around the performance hit by coding it like such:
private static int Sum(IList<int> array)
{
int sum = 0;
if( array is int[] )
foreach (int i in (int[])array)
sum += i;
else
foreach (int i in array)
sum += i;
return sum;
}
Is there a more elegant way of solving this issue? Thank you for your time.
Edit: My benchmark code:
static void Main(string[] args)
{
int[] values = Enumerable.Range(0, 10000000).ToArray<int>();
Stopwatch sw = new Stopwatch();
sw.Start();
Sum(values);
//Sum((IList<int>)values);
sw.Stop();
Console.WriteLine("Elasped: {0} ms", sw.ElapsedMilliseconds);
Console.Read();
}
Welcome to optimization. Things aren't always obvious here!
Basically, as you've found, when the compiler detects that certain types of safety constraints are proven to hold, it can issue enormously more efficient code when doing full optimization. Here (as MagnatLU shows) we see that knowing you've got an array allows all sorts of assumptions to be made about the size being fixed, and it allows memory to be accessed directly (which is also maximally efficient in how it integrates with the CPU's memory prefetch code, for bonus speed). When the compiler doesn't have the proof that it can generate super-fast code, it plays it safe. (This is the right thing to do.)
As a general comment, your workaround code is pretty simple when it comes to coding for optimization (when making the code super-readable and maintainable isn't always the first consideration). I don't really see how you could better it without making your class's API more complex (not a win!). Moreover, just adding a comment inside the body to say why you've done this would solve the maintenance issue; this is in fact one of the best uses for (non-doc) comments in the code in the first place. Given that the use case is large arrays (i.e., that it's reasonable to optimize at all in the first place) I'd say you have a great solution right there.
Your best bet is to create overload for Sum
with int[]
as argument if this method is performance-critical. CLR's JIT can detect foreach-style iteration over array and can skip range checking and address each element directly. Each iteration of loop takes 3-5 instructions on x86, with only one memory lookup.
When using IList, JIT does not have knowledge about underlying collection's structure and ends up using IEnumerator<int>
. Each iteration of loop uses two interface invocation - one for Current
, one for MoveNext
(2-3 memory lookups and a call for each of those). This most likely ends up with ~20 quite expensive instructions and there is not much you can do about it.
Edit: If you are curious about actual machine code emitted by JIT from Release build without debugger attached, here it is.
int s = 0;
00000000 push ebp
00000001 mov ebp,esp
00000003 push edi
00000004 push esi
00000005 xor esi,esi
foreach (int i in arg)
00000007 xor edx,edx
00000009 mov edi,dword ptr [ecx+4]
0000000c test edi,edi
0000000e jle 0000001B
00000010 mov eax,dword ptr [ecx+edx*4+8]
s += i;
00000014 add esi,eax
00000016 inc edx
foreach (int i in arg)
00000017 cmp edi,edx
00000019 jg 00000010
int s = 0;
00000000 push ebp
00000001 mov ebp,esp
00000003 push edi
00000004 push esi
00000005 push ebx
00000006 sub esp,1Ch
00000009 mov esi,ecx
0000000b lea edi,[ebp-28h]
0000000e mov ecx,6
00000013 xor eax,eax
00000015 rep stos dword ptr es:[edi]
00000017 mov ecx,esi
00000019 xor eax,eax
0000001b mov dword ptr [ebp-18h],eax
0000001e xor edx,edx
00000020 mov dword ptr [ebp-24h],edx
foreach (int i in arg)
00000023 call dword ptr ds:[009E0010h]
00000029 mov dword ptr [ebp-28h],eax
0000002c mov ecx,dword ptr [ebp-28h]
0000002f call dword ptr ds:[009E0090h]
00000035 test eax,eax
00000037 je 00000052
00000039 mov ecx,dword ptr [ebp-28h]
0000003c call dword ptr ds:[009E0110h]
s += i;
00000042 add dword ptr [ebp-24h],eax
foreach (int i in arg)
00000045 mov ecx,dword ptr [ebp-28h]
00000048 call dword ptr ds:[009E0090h]
0000004e test eax,eax
00000050 jne 00000039
00000052 mov dword ptr [ebp-1Ch],0
00000059 mov dword ptr [ebp-18h],0FCh
00000060 push 0F403BCh
00000065 jmp 00000067
00000067 cmp dword ptr [ebp-28h],0
0000006b je 00000076
0000006d mov ecx,dword ptr [ebp-28h]
00000070 call dword ptr ds:[009E0190h]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With