While doing some fancy code generation, I've encountered a stack overflow that I don't understand.
My code is basically like this:
static Tuple<string, int>[] DoWork()
{
// [ call some methods ]
Tuple<string, int>[] tmp = new Tuple<string, int>[100];
tmp[0] = new Tuple<string, int>("blah 1", 0);
tmp[1] = new Tuple<string, int>("blah 2", 1);
tmp[2] = new Tuple<string, int>("blah 3", 2);
// ...
tmp[99] = new Tuple<string, int>("blah 99", 99);
return tmp;
}
If you use small numbers like here (100) everything works fine. If the numbers are large, strange things happens. In my case, I tried emitting approximately 10K lines of code like this, which triggered a stack overflow exception.
So... why do I think this is strange:
Reproducing the strangeness...
I cannot reproduce the stackoverflow in a minimum test case, but I did notice it seems to be triggered on 64-bit .NET 4.5. What I can give is some evidence that demonstrates what's going on.
Also note that the real code uses Reflection.Emit
code that does this code generation... it's not like the code itself has all these lines of code... The emitted IL code is correct BTW.
In Visual Studio - put a breakpoint on the last line. Notice the use of the stack pointer in the disassembly (ASM, not IL).
Now add a new line to the code -- e.g. tmp[100] = // the usuals
. Put a breakpoint here as well and notice that the used stack space grows.
As for an attempt to reproduce using a minimum test-case using Reflection.Emit
, this is the code (which DOES NOT reproduce the problem strangely enough -- but is very close to what I've done to trigger the stack overflow... it should give a bit of a picture what I'm trying to do, and perhaps someone else can produce a viable test case using this). Here goes:
public static void Foo()
{
Console.WriteLine("Foo!");
}
static void Main(string[] args)
{
// all this just to invoke one opcode with no arguments!
var assemblyName = new AssemblyName("MyAssembly");
var assemblyBuilder =
AppDomain.CurrentDomain.DefineDynamicAssembly(assemblyName,
AssemblyBuilderAccess.RunAndCollect);
// Create module
var moduleBuilder = assemblyBuilder.DefineDynamicModule("MyModule");
var type = moduleBuilder.DefineType("MyType", TypeAttributes.Public, typeof(object));
var method = type.DefineMethod("Test", System.Reflection.MethodAttributes.Public | System.Reflection.MethodAttributes.Static, System.Reflection.CallingConventions.Standard, typeof(Tuple<string, int>[]), new Type[0]);
ILGenerator gen = method.GetILGenerator();
int count = 0x10000;
gen.Emit(OpCodes.Call, typeof(StackOverflowGenerator).GetMethod("Foo"));
var loc = gen.DeclareLocal(typeof(Tuple<string, int>[]));
gen.Emit(OpCodes.Ldc_I4, count);
gen.Emit(OpCodes.Newarr, typeof(Tuple<string, int>));
gen.Emit(OpCodes.Stloc, loc);
for (int i = 0; i < count; ++i)
{
// Load array
gen.Emit(OpCodes.Ldloc, loc);
gen.Emit(OpCodes.Ldc_I4, i);
// Construct tuple:
gen.Emit(OpCodes.Ldstr, "This is the string");
gen.Emit(OpCodes.Ldc_I4, i);
gen.Emit(OpCodes.Newobj, typeof(Tuple<string, int>).GetConstructor(new[] { typeof(string), typeof(int) }));
// Store in the array
gen.Emit(OpCodes.Stelem_Ref);
}
// Return the result
gen.Emit(OpCodes.Ldloc, loc);
gen.Emit(OpCodes.Ret);
var materialized = type.CreateType();
var tmp = checked((Tuple<string, int>[])materialized.GetMethod("Test").Invoke(null, new object[0]));
int total = 0;
foreach (var item in tmp)
{
total += item.Item1.Length + item.Item2;
}
Console.WriteLine("Total: {0}", total);
Console.ReadLine();
}
My question
How on earth can something like this produce a SOE? What's going on here? Why are things put on the stack in this context anyways?
Every new
operator in a function requires a DWORD
at the stack, even new object()
, that will be present regardless of optimization and release/debug mode! This effectively means that you are limited in the amount of times the new
keyword is present in a function, according to your stack size.
The SOF is caused because the JIT generates code that tries to allocate too much space on the stack (using sub esp <number>
). The JIT chooses how much to allocate upon inspecting the usage of the stack in the function. If you have many local variables, your function will have to use more memory on the stack, and the JIT has no way of knowing how large will the stack be at runtime, so it crashes at runtime. A temporary solution might be to make the stack larger using compiler flags or such.
Your code doesn't use many variable on the stack, in fact, you explicitly use only one, the pointer to the array.
However, your code (when used without optimizations) creates many "temporary one-time" variables, each for each string
and each integer
that you use in the new Tuple<...>
. They will disappear with optimization turned on.
i.e, instead of something like this:
var x = new Tuple<string, int>("blah 1", 0);
tmp[0] = x;
x = new Tuple<string, int>("blah 2", 1);
tmp[1] = x;
You end up with something like this:
var str1 = "blah 1";
var int1 = 0;
var x = new Tuple<string, int>(str1, int1);
tmp[0] = x;
var str2 = "blah 2";
var int2 = 1;
var x2 = new Tuple<string, int>(str2, int2);
tmp[1] = x2;
As you may see in this disassembly:
tmp[0] = new Tuple<string, int>("blah 1", 0);
00FB26AE mov ecx,6D5203BCh
00FB26B3 call 00F32100
00FB26B8 mov dword ptr [ebp-48h],eax
00FB26BB push 0
00FB26BD mov edx,dword ptr ds:[3B721F0h]
00FB26C3 mov ecx,dword ptr [ebp-48h]
00FB26C6 call 6D47C0DC
00FB26CB push dword ptr [ebp-48h]
00FB26CE mov ecx,dword ptr [ebp-3Ch] // ecx = (ebp - 0x3C) [ == tmp ]
00FB26D1 xor edx,edx
00FB26D3 call 6E2883FF // ecx.setElement(0, ebp - 0x48)
tmp[1] = new Tuple<string, int>("blah 2", 1);
00FB26D8 mov ecx,6D5203BCh
00FB26DD call 00F32100
00FB26E2 mov dword ptr [ebp-4Ch],eax
00FB26E5 push 1
00FB26E7 mov edx,dword ptr ds:[3B721F4h]
00FB26ED mov ecx,dword ptr [ebp-4Ch]
00FB26F0 call 6D47C0DC
00FB26F5 push dword ptr [ebp-4Ch]
00FB26F8 mov ecx,dword ptr [ebp-3Ch] // ecx = (ebp - 0x3C) [ == tmp ]
00FB26FB mov edx,1
00FB2700 call 6E2883FF // ecx.setElement = (1, ebp - 0x4C)
Let us change your code to something like this:
Tuple<string, int>[] tmp = new Tuple<string, int>[10000];
var str = "blah 1";
var i = 0;
var x = new Tuple<string, int>(str, i);
tmp[0] = x;
str = "blah 2";
i = 1;
x = new Tuple<string, int>(str, i);
tmp[1] = x;
This code produces a function that uses less memory on the stack stack. However, upon deeper inspection, that code will also produce a "one time" variable on the stack for each new Tuple
, so by increasing the amount of assignments you also increase the stack usage.
str = "blah 2";
008A26E9 mov eax,dword ptr ds:[32421F4h]
008A26EF mov dword ptr [ebp-10h],eax
i = 1;
008A26F2 mov dword ptr [ebp-8],1
x = new Tuple<string, int>(str, i);
008A26F9 mov ecx,6D5203BCh
008A26FE call 006C2100
008A2703 mov dword ptr [ebp-20h],eax // this is the one-time variable
008A2706 push dword ptr [ebp-8]
008A2709 mov ecx,dword ptr [ebp-20h]
008A270C mov edx,dword ptr [ebp-10h]
008A270F call 6D47C0DC
008A2714 mov eax,dword ptr [ebp-20h]
008A2717 mov dword ptr [ebp-14h],eax
tmp[1] = x;
008A271A push dword ptr [ebp-14h]
008A271D mov ecx,dword ptr [ebp-0Ch]
008A2720 mov edx,1
008A2725 call 6E2883FF
str = "blah 3";
008A272A mov eax,dword ptr ds:[32421F8h]
str = "blah 3";
008A2730 mov dword ptr [ebp-10h],eax
i = 2;
008A2733 mov dword ptr [ebp-8],2
x = new Tuple<string, int>(str, i);
008A273A mov ecx,6D5203BCh
008A273F call 006C2100
008A2744 mov dword ptr [ebp-24h],eax // this is the one-time variable
008A2747 push dword ptr [ebp-8]
008A274A mov ecx,dword ptr [ebp-24h]
008A274D mov edx,dword ptr [ebp-10h]
008A2750 call 6D47C0DC
008A2755 mov eax,dword ptr [ebp-24h]
008A2758 mov dword ptr [ebp-14h],eax
tmp[2] = x;
008A275B push dword ptr [ebp-14h]
008A275E mov ecx,dword ptr [ebp-0Ch]
008A2761 mov edx,2
008A2766 call 6E2883FF
What's worse, is that it will produce this "one time" variable in the stack even in release mode with optimizations turned on!
This causes me to believe this is a problem either the JIT engine or the compiler itself. So lets inspect the MSIL that the compiler gave us:
ldstr aBlah2 // "blah 2"
stloc.1 // Pop value from stack into local variable 1
ldc.i4.1 // Push 1 onto the stack as I4
stloc.2 // Pop value from stack into local variable 2
ldloc.1 // Load local variable 1 onto stack
ldloc.2 // Load local variable 2 onto stack
newobj instance void class [mscorlib]System.Tuple`2<string, int32>::.ctor(var<u1>, !!T0) // Create a new object
stloc.3 // Pop value from stack into local variable 3
ldloc.0 // Load local variable 0 onto stack
ldc.i4.1 // Push 1 onto the stack as I4
ldloc.3 // Load local variable 3 onto stack
stelem.ref // Replace array element at index with the ref value on the s
Which, when commented, is:
push "blah 2"
local_str = pop // "blah 2"
push 1
local_int = pop
push local_str // "blah 2"
push local_int // 1
push new Tuple(...)
local_tuple = pop
push local_array
push 0
push local_tuple
pop[pop] = pop (i.e arr[indx] = value)
So the JIT code generally seems OK.
Therefore I conclude that this is a problem in the JIT engine
Generally, this means that for each construction of the Tuple
class an unnecessary DWORD
is used in the stack, which is very bad for cases like yours, but doesn't mean anything for programs that don't do very much "manual" assignments like your code does.
In x64 bit, the following C# code:
var a = new object();
a = new object();
a = new object();
a = new object();
a = new object();
a = new object();
a = new object();
Compiled and JIT to:
a = new object();
00007FFAD0033B5F call 00007FFB2F662300
00007FFAD0033B64 mov qword ptr [rsp+40h],rax
00007FFAD0033B69 mov rax,qword ptr [rsp+40h]
00007FFAD0033B6E mov qword ptr [rsp+48h],rax
00007FFAD0033B73 mov rcx,qword ptr [rsp+48h]
00007FFAD0033B78 call 00007FFB2E455BC0
00007FFAD0033B7D nop
a = new object();
00007FFAD0033B7E lea rcx,[7FFB2E6611B8h]
00007FFAD0033B85 call 00007FFB2F662300
00007FFAD0033B8A mov qword ptr [rsp+50h],rax
00007FFAD0033B8F mov rax,qword ptr [rsp+50h]
00007FFAD0033B94 mov qword ptr [rsp+58h],rax
00007FFAD0033B99 mov rcx,qword ptr [rsp+58h]
00007FFAD0033B9E call 00007FFB2E455BC0
00007FFAD0033BA3 nop
// and so on....
And produces many unused QWORD
s.
On x86, the code looks like this:
a = new object();
00882687 mov ecx,6D512554h
0088268C call 00652100
00882691 mov dword ptr [ebp-0Ch],eax
00882694 mov ecx,dword ptr [ebp-0Ch]
00882697 call 6D410B40
0088269C nop
a = new object();
0088269D mov ecx,6D512554h
008826A2 call 00652100
008826A7 mov dword ptr [ebp-10h],eax
008826AA mov ecx,dword ptr [ebp-10h]
008826AD call 6D410B40
008826B2 nop
// and so on...
Which is much more efficient, but still, "wastes" many DWORDS
.
Actually, not much. The root of the problem lies in the JIT having to allocate a DWORD
on the stack for each new
operator (maybe so it can keep track of them? I can't tell). Your only solution (without that being fixed) is to make multiple functions that each will handle a part of the assignments that you need.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With