Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Stackoverflow on startup on large collection initializer

I'm building an application which uses relatively large tables to do its work (LR tables, to be precise). As I'm generating code anyway and the table isn't that large, I decided to serialize my table by generating code that uses the C# collection initializer syntax to initialize the table on startup of my generated program:

public static readonly int[,] gotoTable = new int[,]
{
    {
        0,1,0,0,0,0,0,0,0,0,0,0,0,0,(...)
    },
    {
        0,0,4,0,5,6,0,0,0,0,0,7,0,0,(...)
    },
    (...)

Oddly enough, when I generated a table that had only a couple hundred thousand entries, the application that I generated crashes with a StackOverflowException on startup. The C# compiler compiles it just fine; the table generation application also runs just fine. In fact, when I switched to Release mode, the application did start up. An OutOfMemoryException might have made some sense, but even then the table I use is way to small for an OutOfMemoryException.

Code to reproduce this:

Warning: trying the code below in release mode crashed Visual Studio 2010 for me; watch out for losing unsaved work. Additionally, if you generate code for which the compiler generates lots of errors, Visual Studio will hang as well.

//Generation Project, main.cs:
using (StreamWriter writer = new StreamWriter("../../../VictimProject/Tables.cs"))
{
    writer.WriteLine("using System;");
    writer.WriteLine("public static class Tables");
    writer.WriteLine("{");
    writer.WriteLine("    public static readonly Tuple<int>[] bigArray = new Tuple<int>[]");
    writer.WriteLine("    {");
    for (int i = 0; i < 300000; i++)
        writer.WriteLine("        new Tuple<int>(" + i + "),");
    writer.WriteLine("    };");
    writer.WriteLine("}");
}
//Victim Project, main.cs:
for (int i = 0; i < 1234; i++)
{
    // Preventing the jitter from removing Tables.bigArray
    if (Tables.bigArray[i].Item1 == 10)
        Console.WriteLine("Found it!");
}
Console.ReadKey(true);

Run the first project for the Tables.cs file, and then the second program to get the StackOverflowException. Note that the above crashes on my computer: it might not on different platforms etc; try increasing 300000 if it doesn't.

Using release mode instead of debug mode seems to increase the limit slightly, as my project doesn't crash in release mode. However, the code above crashes in both modes for me.

Using literal ints or strings instead of Tuple<int>s doesn't cause the crash, nor does "new int()" (but that might get converted into a literal 0). Using a struct with a single int field does cause the crash. It seems to be related to using a constructor as initializer.

My guess is that the collection initializer is somehow implemented recursively, which would explain the stack overflow. However, that is a very weird thing to do as an iterative solutions seems a lot simpler and more efficient. The C# compiler itself doesn't have any problems with the program and compiles it very fast (it handles even larger collections well, but it does crash on positively huge collections, as expected).

I guess there's probably some way to write my table directly to a binary file and then link that file, but I haven't looked at that yet.

I guess I have two questions: why does the above happen, and how do I work around it?

Edit: some interesting details after disassembling the .exe:

.maxstack  4
.locals init ([0] class [mscorlib]System.Tuple`1<int32>[] CS$0$0000)
IL_0000:  ldc.i4     0x493e0
IL_0005:  newarr     class [mscorlib]System.Tuple`1<int32>
IL_000a:  stloc.0
IL_000b:  ldloc.0
IL_000c:  ldc.i4.0
IL_000d:  ldc.i4.0
IL_000e:  newobj     instance void class [mscorlib]System.Tuple`1<int32>::.ctor(!0)
IL_0013:  stelem.ref
IL_0014:  ldloc.0
IL_0015:  ldc.i4.1
IL_0016:  ldc.i4.1
IL_0017:  newobj     instance void class [mscorlib]System.Tuple`1<int32>::.ctor(!0)
IL_001c:  stelem.ref
(goes on and on)

This suggests that the jitter indeed crashes with a stack overflow trying to jit this method. Still, it's weird that it does, and in particular, that I get an exception out of it.

like image 409
Alex ten Brink Avatar asked Apr 12 '12 21:04

Alex ten Brink


2 Answers

why does the above happen

I suspect it may be the JIT crashing. You will be generating an enormous type initializer (.cctor member in IL). Each value is going to be 5 IL instructions. I'm not entirely surprised a member with 1.5 million instructions causes problems...

and how do I work around it?

Include the data into an embedded resource file instead, and load it in the type initializer if you need to. I'm assuming this is generated data - so put data where it belongs, in a binary file rather than as literal code.

like image 112
Jon Skeet Avatar answered Nov 14 '22 01:11

Jon Skeet


If it tries to pre-push all those onto the stack, that it going to need a mass of stack space, so personally I would indeed expect stack-overflow here, depending on how the compiler does it.

Having done something similar before (something that breaks every tool like reflector, because the IL is too big), my advice from experience is: do that via serialization, not via c#. In my case I did pretty much exactly that via protobuf-net, i.e.

  • generated the model (without data) as code
  • executed it to populate the model from the database
  • serialized it to a file
  • shipped the file with my deployment
  • deserialized during initialisation

But - I seem to recall having this discussion recently; if it was with yourself, then I stand entirely by my previous remarks. The way you are trying to do it is still problematic. The above approach (from direct experience) works very well. As IL? Not so much.

Note: If you absolutely wanted to write the file without the execute step, that is possible too - just trickier.

like image 9
Marc Gravell Avatar answered Nov 14 '22 01:11

Marc Gravell