Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Weird stackoverflow in c# when allocating reference types

While doing some fancy code generation, I've encountered a stack overflow that I don't understand.

My code is basically like this:

static Tuple<string, int>[] DoWork() 
{
    // [ call some methods ]
    Tuple<string, int>[] tmp = new Tuple<string, int>[100];
    tmp[0] = new Tuple<string, int>("blah 1", 0);
    tmp[1] = new Tuple<string, int>("blah 2", 1);
    tmp[2] = new Tuple<string, int>("blah 3", 2);
    // ...
    tmp[99] = new Tuple<string, int>("blah 99", 99);
    return tmp;
}

If you use small numbers like here (100) everything works fine. If the numbers are large, strange things happens. In my case, I tried emitting approximately 10K lines of code like this, which triggered a stack overflow exception.

So... why do I think this is strange:

  • tmp is a local of a reference type, and therefore I expect only the pointer to be allocated on the heap.
  • Tuples are reference types and allocated on the heap.
  • No recursion or other weirdness; afaik the storage requirements on the heap should be limited.

Reproducing the strangeness...

I cannot reproduce the stackoverflow in a minimum test case, but I did notice it seems to be triggered on 64-bit .NET 4.5. What I can give is some evidence that demonstrates what's going on.

Also note that the real code uses Reflection.Emit code that does this code generation... it's not like the code itself has all these lines of code... The emitted IL code is correct BTW.

In Visual Studio - put a breakpoint on the last line. Notice the use of the stack pointer in the disassembly (ASM, not IL).

Now add a new line to the code -- e.g. tmp[100] = // the usuals. Put a breakpoint here as well and notice that the used stack space grows.

As for an attempt to reproduce using a minimum test-case using Reflection.Emit, this is the code (which DOES NOT reproduce the problem strangely enough -- but is very close to what I've done to trigger the stack overflow... it should give a bit of a picture what I'm trying to do, and perhaps someone else can produce a viable test case using this). Here goes:

public static void Foo()
{
    Console.WriteLine("Foo!");
}

static void Main(string[] args)
{
    // all this just to invoke one opcode with no arguments!
    var assemblyName = new AssemblyName("MyAssembly");

    var assemblyBuilder =
        AppDomain.CurrentDomain.DefineDynamicAssembly(assemblyName,
        AssemblyBuilderAccess.RunAndCollect);

    // Create module
    var moduleBuilder = assemblyBuilder.DefineDynamicModule("MyModule");

    var type = moduleBuilder.DefineType("MyType", TypeAttributes.Public, typeof(object));

    var method = type.DefineMethod("Test", System.Reflection.MethodAttributes.Public | System.Reflection.MethodAttributes.Static, System.Reflection.CallingConventions.Standard, typeof(Tuple<string, int>[]), new Type[0]);

    ILGenerator gen = method.GetILGenerator();
    int count = 0x10000;

    gen.Emit(OpCodes.Call, typeof(StackOverflowGenerator).GetMethod("Foo"));

    var loc = gen.DeclareLocal(typeof(Tuple<string, int>[]));
    gen.Emit(OpCodes.Ldc_I4, count);
    gen.Emit(OpCodes.Newarr, typeof(Tuple<string, int>));
    gen.Emit(OpCodes.Stloc, loc);

    for (int i = 0; i < count; ++i)
    {
        // Load array
        gen.Emit(OpCodes.Ldloc, loc);
        gen.Emit(OpCodes.Ldc_I4, i);

        // Construct tuple:
        gen.Emit(OpCodes.Ldstr, "This is the string");
        gen.Emit(OpCodes.Ldc_I4, i);
        gen.Emit(OpCodes.Newobj, typeof(Tuple<string, int>).GetConstructor(new[] { typeof(string), typeof(int) }));

        // Store in the array
        gen.Emit(OpCodes.Stelem_Ref);
    }

    // Return the result
    gen.Emit(OpCodes.Ldloc, loc);
    gen.Emit(OpCodes.Ret);

    var materialized = type.CreateType();

    var tmp = checked((Tuple<string, int>[])materialized.GetMethod("Test").Invoke(null, new object[0]));

    int total = 0;
    foreach (var item in tmp)
    {
        total += item.Item1.Length + item.Item2;
    }
    Console.WriteLine("Total: {0}", total);
    Console.ReadLine();
}

My question

How on earth can something like this produce a SOE? What's going on here? Why are things put on the stack in this context anyways?

like image 530
atlaste Avatar asked May 21 '15 19:05

atlaste


1 Answers

There are some problems with your generated code, but the deeper problem lies in the JIT engine

tl;dr

Every new operator in a function requires a DWORD at the stack, even new object(), that will be present regardless of optimization and release/debug mode! This effectively means that you are limited in the amount of times the new keyword is present in a function, according to your stack size.

What causes the problem?

The SOF is caused because the JIT generates code that tries to allocate too much space on the stack (using sub esp <number>). The JIT chooses how much to allocate upon inspecting the usage of the stack in the function. If you have many local variables, your function will have to use more memory on the stack, and the JIT has no way of knowing how large will the stack be at runtime, so it crashes at runtime. A temporary solution might be to make the stack larger using compiler flags or such.

Who's fault is it?

Your code doesn't use many variable on the stack, in fact, you explicitly use only one, the pointer to the array.

However, your code (when used without optimizations) creates many "temporary one-time" variables, each for each string and each integer that you use in the new Tuple<...>. They will disappear with optimization turned on.

i.e, instead of something like this:

var x = new Tuple<string, int>("blah 1", 0);
tmp[0] = x;
x = new Tuple<string, int>("blah 2", 1);
tmp[1] = x;

You end up with something like this:

var str1 = "blah 1";
var int1 = 0;
var x = new Tuple<string, int>(str1, int1);
tmp[0] = x;
var str2 = "blah 2";
var int2 = 1;
var x2 = new Tuple<string, int>(str2, int2);
tmp[1] = x2;

As you may see in this disassembly:

            tmp[0] = new Tuple<string, int>("blah 1", 0);
00FB26AE  mov         ecx,6D5203BCh  
00FB26B3  call        00F32100  
00FB26B8  mov         dword ptr [ebp-48h],eax  
00FB26BB  push        0  
00FB26BD  mov         edx,dword ptr ds:[3B721F0h]  
00FB26C3  mov         ecx,dword ptr [ebp-48h]  
00FB26C6  call        6D47C0DC  
00FB26CB  push        dword ptr [ebp-48h]  
00FB26CE  mov         ecx,dword ptr [ebp-3Ch]   // ecx = (ebp - 0x3C) [ == tmp ]
00FB26D1  xor         edx,edx  
00FB26D3  call        6E2883FF                  // ecx.setElement(0, ebp - 0x48) 
            tmp[1] = new Tuple<string, int>("blah 2", 1);
00FB26D8  mov         ecx,6D5203BCh  
00FB26DD  call        00F32100  
00FB26E2  mov         dword ptr [ebp-4Ch],eax  
00FB26E5  push        1  
00FB26E7  mov         edx,dword ptr ds:[3B721F4h]  
00FB26ED  mov         ecx,dword ptr [ebp-4Ch]  
00FB26F0  call        6D47C0DC  
00FB26F5  push        dword ptr [ebp-4Ch]
00FB26F8  mov         ecx,dword ptr [ebp-3Ch]  // ecx = (ebp - 0x3C) [ == tmp ]
00FB26FB  mov         edx,1  
00FB2700  call        6E2883FF                 // ecx.setElement = (1, ebp - 0x4C)

Let us change your code to something like this:

Tuple<string, int>[] tmp = new Tuple<string, int>[10000];
var str = "blah 1";
var i = 0;
var x = new Tuple<string, int>(str, i);
tmp[0] = x;

str = "blah 2";
i = 1;
x = new Tuple<string, int>(str, i);
tmp[1] = x;

This code produces a function that uses less memory on the stack stack. However, upon deeper inspection, that code will also produce a "one time" variable on the stack for each new Tuple, so by increasing the amount of assignments you also increase the stack usage.

            str = "blah 2";
008A26E9  mov         eax,dword ptr ds:[32421F4h]  
008A26EF  mov         dword ptr [ebp-10h],eax  
            i = 1;
008A26F2  mov         dword ptr [ebp-8],1  
            x = new Tuple<string, int>(str, i);
008A26F9  mov         ecx,6D5203BCh  
008A26FE  call        006C2100  
008A2703  mov         dword ptr [ebp-20h],eax           // this is the one-time variable
008A2706  push        dword ptr [ebp-8]  
008A2709  mov         ecx,dword ptr [ebp-20h]  
008A270C  mov         edx,dword ptr [ebp-10h]  
008A270F  call        6D47C0DC  
008A2714  mov         eax,dword ptr [ebp-20h]  
008A2717  mov         dword ptr [ebp-14h],eax  
            tmp[1] = x;
008A271A  push        dword ptr [ebp-14h]  
008A271D  mov         ecx,dword ptr [ebp-0Ch]  
008A2720  mov         edx,1  
008A2725  call        6E2883FF  

            str = "blah 3";
008A272A  mov         eax,dword ptr ds:[32421F8h]  

            str = "blah 3";
008A2730  mov         dword ptr [ebp-10h],eax  
            i = 2;
008A2733  mov         dword ptr [ebp-8],2  
            x = new Tuple<string, int>(str, i);
008A273A  mov         ecx,6D5203BCh  
008A273F  call        006C2100  
008A2744  mov         dword ptr [ebp-24h],eax           // this is the one-time variable
008A2747  push        dword ptr [ebp-8]  
008A274A  mov         ecx,dword ptr [ebp-24h]  
008A274D  mov         edx,dword ptr [ebp-10h]  
008A2750  call        6D47C0DC  
008A2755  mov         eax,dword ptr [ebp-24h]  
008A2758  mov         dword ptr [ebp-14h],eax  
            tmp[2] = x;
008A275B  push        dword ptr [ebp-14h]  
008A275E  mov         ecx,dword ptr [ebp-0Ch]  
008A2761  mov         edx,2  
008A2766  call        6E2883FF  

What's worse, is that it will produce this "one time" variable in the stack even in release mode with optimizations turned on!

This causes me to believe this is a problem either the JIT engine or the compiler itself. So lets inspect the MSIL that the compiler gave us:

ldstr    aBlah2         // "blah 2"
stloc.1                 // Pop value from stack into local variable 1
ldc.i4.1                // Push 1 onto the stack as I4
stloc.2                 // Pop value from stack into local variable 2
ldloc.1                 // Load local variable 1 onto stack
ldloc.2                 // Load local variable 2 onto stack
newobj   instance void class [mscorlib]System.Tuple`2<string, int32>::.ctor(var<u1>, !!T0) // Create a new object
stloc.3                 // Pop value from stack into local variable 3
ldloc.0                 // Load local variable 0 onto stack
ldc.i4.1                // Push 1 onto the stack as I4
ldloc.3                 // Load local variable 3 onto stack
stelem.ref              // Replace array element at index with the ref value on the s

Which, when commented, is:

push "blah 2"
local_str = pop // "blah 2"
push 1
local_int = pop
push local_str // "blah 2"
push local_int // 1

push new Tuple(...)
local_tuple = pop
push local_array
push 0
push local_tuple
pop[pop] = pop (i.e arr[indx] = value)

So the JIT code generally seems OK.

Therefore I conclude that this is a problem in the JIT engine

Generally, this means that for each construction of the Tuple class an unnecessary DWORD is used in the stack, which is very bad for cases like yours, but doesn't mean anything for programs that don't do very much "manual" assignments like your code does.

This happens even for small functions, and is really weird!

In x64 bit, the following C# code:

var a = new object();
a = new object();
a = new object();
a = new object();
a = new object();
a = new object();
a = new object();

Compiled and JIT to:

            a = new object();
00007FFAD0033B5F  call        00007FFB2F662300  
00007FFAD0033B64  mov         qword ptr [rsp+40h],rax  
00007FFAD0033B69  mov         rax,qword ptr [rsp+40h]  
00007FFAD0033B6E  mov         qword ptr [rsp+48h],rax  
00007FFAD0033B73  mov         rcx,qword ptr [rsp+48h]  
00007FFAD0033B78  call        00007FFB2E455BC0  
00007FFAD0033B7D  nop  
            a = new object();
00007FFAD0033B7E  lea         rcx,[7FFB2E6611B8h]  
00007FFAD0033B85  call        00007FFB2F662300  
00007FFAD0033B8A  mov         qword ptr [rsp+50h],rax  
00007FFAD0033B8F  mov         rax,qword ptr [rsp+50h]  
00007FFAD0033B94  mov         qword ptr [rsp+58h],rax  
00007FFAD0033B99  mov         rcx,qword ptr [rsp+58h]  
00007FFAD0033B9E  call        00007FFB2E455BC0  
00007FFAD0033BA3  nop  
// and so on....

And produces many unused QWORDs.

On x86, the code looks like this:

            a = new object();
00882687  mov         ecx,6D512554h  
0088268C  call        00652100  
00882691  mov         dword ptr [ebp-0Ch],eax  
00882694  mov         ecx,dword ptr [ebp-0Ch]  
00882697  call        6D410B40  
0088269C  nop  
            a = new object();
0088269D  mov         ecx,6D512554h  
008826A2  call        00652100  
008826A7  mov         dword ptr [ebp-10h],eax  
008826AA  mov         ecx,dword ptr [ebp-10h]  
008826AD  call        6D410B40  
008826B2  nop  
// and so on...

Which is much more efficient, but still, "wastes" many DWORDS.

What can you do?

Actually, not much. The root of the problem lies in the JIT having to allocate a DWORD on the stack for each new operator (maybe so it can keep track of them? I can't tell). Your only solution (without that being fixed) is to make multiple functions that each will handle a part of the assignments that you need.

like image 193
Mark Segal Avatar answered Nov 04 '22 07:11

Mark Segal