Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Stackoverflow doing boxing in C#

I have these two chunks of code in C#:

First

class Program {     static Stack<int> S = new Stack<int>();      static int Foo(int n) {         if (n == 0)             return 0;         S.Push(0);         S.Push(1);         ...         S.Push(999);         return Foo( n-1 );     } } 

Second

class Program {     static Stack S = new Stack();      static int Foo(int n) {         if (n == 0)             return 0;         S.Push(0);         S.Push(1);         ...         S.Push(999);         return Foo( n-1 );     } } 

They both do the same:

  1. Create a Stack (generic in <int> for the first example and a stack of object for the second one).

  2. Declare a method that calls itself recursively n times (n >= 0) and in every step push 1000 integers inside the created stack.

When I run the first example with Foo(30000) no exception occurs, however the second example crashes with Foo(1000), just n = 1000.

When I saw the CIL generated for both cases the only difference was the boxing part for every push:

First

IL_0030:  ldsfld     class [System]System.Collections.Generic.Stack`1<int32> Test.Program::S IL_0035:  ldc.i4     0x3e7 IL_003a:  callvirt   instance void class [System]System.Collections.Generic.Stack`1<int32>::Push(!0) IL_003f:  nop 

Second

IL_003a:  ldsfld     class [mscorlib]System.Collections.Stack Test.Program::S IL_003f:  ldc.i4     0x3e7 IL_0044:  box        [mscorlib]System.Int32 IL_0049:  callvirt   instance void [mscorlib]System.Collections.Stack::Push(object) IL_004e:  nop 

My question is: Why, if there is not significant overload of CIL's stack for the second example, does it crash "faster" than the first one?

like image 867
lcastillov Avatar asked Mar 04 '15 21:03

lcastillov


1 Answers

Why, if there is not significant overload of CIL's stack for second example, does it crash "faster" than the first one?

Note that the number of CIL instructions does not accurately represent the amount of work or memory that will be used. A single instruction can be very low impact, or very high impact, so counting CIL instructions is not an accurate way to measure "work".

Also realize that the CIL is not what gets executed. The JIT compiles the CIL to actual machine instructions, with an optimization phase, so the CIL may be very different than the actual executed instructions.

In the second case, since you're using a non-generic collection, every Push call requires the integer to be boxed, as you determined in CIL.

Boxing an integer effectively creates an object which "wraps" the Int32 for you. Instead of just loading a 32-bit integer onto the stack, it now has to load a 32-bit integer onto the stack, then box it, which effectively also loads an object reference onto the stack.

If you inspect this in the Disassembly window, you can see the difference between the generic and non-generic version is dramatic, and much more significant than the generated CIL would suggest.

The generic version effectively compiles to as a series of calls like so:

0000022c  nop              S.Push(25); 0000022d  mov         ecx,dword ptr ds:[03834978h]  00000233  mov         edx,19h  00000238  cmp         dword ptr [ecx],ecx  0000023a  call        71618DD0  0000023f  nop              S.Push(26); 00000240  mov         ecx,dword ptr ds:[03834978h]  00000246  mov         edx,1Ah  0000024b  cmp         dword ptr [ecx],ecx  0000024d  call        71618DD0  00000252  nop              S.Push(27); 

The non-generic, on the other hand, has to create the boxed objects, and instead compiles to:

00000645  nop              S.Push(25); 00000646  mov         ecx,7326560Ch  0000064b  call        FAAC20B0  00000650  mov         dword ptr [ebp-48h],eax  00000653  mov         eax,dword ptr ds:[03AF4978h]  00000658  mov         dword ptr [ebp+FFFFFEE8h],eax  0000065e  mov         eax,dword ptr [ebp-48h]  00000661  mov         dword ptr [eax+4],19h  00000668  mov         eax,dword ptr [ebp-48h]  0000066b  mov         dword ptr [ebp+FFFFFEE4h],eax  00000671  mov         ecx,dword ptr [ebp+FFFFFEE8h]  00000677  mov         edx,dword ptr [ebp+FFFFFEE4h]  0000067d  mov         eax,dword ptr [ecx]  0000067f  mov         eax,dword ptr [eax+2Ch]  00000682  call        dword ptr [eax+18h]  00000685  nop              S.Push(26); 00000686  mov         ecx,7326560Ch  0000068b  call        FAAC20B0  00000690  mov         dword ptr [ebp-48h],eax  00000693  mov         eax,dword ptr ds:[03AF4978h]  00000698  mov         dword ptr [ebp+FFFFFEE0h],eax  0000069e  mov         eax,dword ptr [ebp-48h]  000006a1  mov         dword ptr [eax+4],1Ah  000006a8  mov         eax,dword ptr [ebp-48h]  000006ab  mov         dword ptr [ebp+FFFFFEDCh],eax  000006b1  mov         ecx,dword ptr [ebp+FFFFFEE0h]  000006b7  mov         edx,dword ptr [ebp+FFFFFEDCh]  000006bd  mov         eax,dword ptr [ecx]  000006bf  mov         eax,dword ptr [eax+2Ch]  000006c2  call        dword ptr [eax+18h]  000006c5  nop  

Here you can see the significance of boxing.

In your case, boxing the integer causes the boxed object references to get loaded onto the stack. On my system, this is causing a stackoverflow on any calls larger than Foo(127) (in 32 bit), which suggests that the integers and boxed object references (4 bytes each) are all being kept on the stack, as 127*1000*8==1016000, which is dangerously close to the default 1 MB thread stack size for .NET applications.

When using the generic version, since there is no boxed object, the integers aren't having to all be stored on the stack, and the same register is being reused. This allows you to recurse significantly more (>40000 on my system) before you use up the stack.

Note that this will be CLR version and platform dependent, as there is a different JIT on x86/x64, as well.

like image 101
Reed Copsey Avatar answered Sep 19 '22 18:09

Reed Copsey