Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I track down the cause of a StackOverflowException in .NET?

I get a StackOverflowException when I run the following code:

private void MyButton_Click(object sender, EventArgs e) {   MyButton_Click_Aux(); }  private static volatile int reportCount;  private static void MyButton_Click_Aux() {   try { /*remove because stack overflows without*/ }   finally {     var myLogData = new ArrayList();     myLogData.Add(reportCount);     myLogData.Add("method MyButtonClickAux");     Log(myLogData);   } }  private static void Log(object logData) {   // my log code is not matter } 

What could be causing the StackOverflowException?

like image 923
PRASHANT P Avatar asked Feb 02 '11 21:02

PRASHANT P


People also ask

Can I catch StackOverflowException C#?

Starting with the . NET Framework version 2.0, a StackOverflowException object cannot be caught by a try-catch block and the corresponding process is terminated by default. Consequently, users are advised to write their code to detect and prevent a stack overflow.

What causes StackOverflowException?

A StackOverflowException is thrown when the execution stack overflows because it contains too many nested method calls. using System; namespace temp { class Program { static void Main(string[] args) { Main(args); // Oops, this recursion won't stop. } } }

How do I handle StackOverflowException?

StackOverflowException is thrown for execution stack overflow errors, typically in case of a very deep or unbounded recursion. So make sure your code doesn't have an infinite loop or infinite recursion. StackOverflowException uses the HRESULT COR_E_STACKOVERFLOW, which has the value 0x800703E9.

Can you catch a StackOverflowException Java?

Catching a stack overflow exception is a bad idea; you are treating the symptoms without addressing the underlying cause. In the case of the use of recursion and overflow of the set options of the JVM the SO is a normal negative answer of a correct program. We MUST catch it and react.


2 Answers

I know how to stop it from happening

I just don't know why it causes it (yet). And it would appear you have indeed found a bug either in the .Net BCL or, more likely, in the JIT.

I just commented out all the lines in the MyButton_Click_Aux method and then started bringing them back in, one by one.

Take off the volatile from the static int and you'll no longer get a StackOverflowException.

Now to research why... Clearly something to do with Memory Barriers is causing an issue - perhaps somehow forcing the MyButton_Click_Aux method to call itself...

UPDATE

Okay so other people are finding that .Net 3.5 is not an issue.

I'm using .Nt 4 as well so these comments relate to that:

As I said, take the volatile off and it works.

Equally, if you put the volatile back on and remove the try/finally it also works:

private static void MyButton_Click_Aux() {   //try { /*remove because stack overflows without*/ }   //finally   //{     var myLogData = new ArrayList();      myLogData.Add(reportCount);      //myLogData.Add("method MyButtonClickAux");     //Log(myLogData);   //} }   

I also wondered if it was something to do with the uninitialised reportCount when the try/finally is in. But it makes no difference if you initialise it to zero.

I'm looking at the IL now - although it might require someone with some ASM chaps to get involved...

Final Update As I say, this really is going to require analysis of the JIT output to really understand what's happening and whilst I find it fun to analyse assembler - I feel it's probably a job for someone in Microsoft so this bug can actually be confirmed and fixed! That said - it appears to be a pretty narrow set of circumstances.

I've moved over to a release build to get rid of all the IL noise (nops etc) for analysis.

This has, however, had a complicating impact on the diagnosis. I thought I had it but didn't - but now I know what it is.

I tried this code:

private static void MyButton_Click_Aux() {   try { }   finally   {     var myLogData = new ArrayList();     Console.WriteLine(reportCount);     //myLogData.Add("method MyButtonClickAux");     //Log(myLogData);   } } 

With the int as volatile. It runs without fault. Here's the IL:

.maxstack 1 L_0000: leave.s L_0015 L_0002: newobj instance void [mscorlib]System.Collections.ArrayList::.ctor() L_0007: pop  L_0008: volatile.  L_000a: ldsfld int32 modreq([mscorlib]System.Runtime.CompilerServices.IsVolatile) WindowsFormsApplication1.Form1::reportCount L_000f: call void [mscorlib]System.Console::WriteLine(int32) L_0014: endfinally  L_0015: ret  .try L_0000 to L_0002 finally handler L_0002 to L_0015 

Then we look at the minimum code required to get the error again:

private static void MyButton_Click_Aux() {   try { }   finally   {     var myLogData = new ArrayList();     myLogData.Add(reportCount);   } } 

And it's IL:

.maxstack 2 .locals init (     [0] class [mscorlib]System.Collections.ArrayList myLogData) L_0000: leave.s L_001c L_0002: newobj instance void [mscorlib]System.Collections.ArrayList::.ctor() L_0007: stloc.0  L_0008: ldloc.0  L_0009: volatile.  L_000b: ldsfld int32 modreq([mscorlib]System.Runtime.CompilerServices.IsVolatile) WindowsFormsApplication1.Form1::reportCount L_0010: box int32 L_0015: callvirt instance int32 [mscorlib]System.Collections.ArrayList::Add(object) L_001a: pop  L_001b: endfinally  L_001c: ret  .try L_0000 to L_0002 finally handler L_0002 to L_001c 

The difference? Well there's two that I spotted - boxing of the volatile int, and a virtual call. So I setup these two classes:

public class DoesNothingBase {   public void NonVirtualFooBox(object arg) { }   public void NonVirtualFooNonBox(int arg) { }    public virtual void FooBox(object arg) { }   public virtual void FooNonBox(int arg) { } }  public class DoesNothing : DoesNothingBase {   public override void FooBox(object arg) { }   public override void FooNonBox(int arg) { } } 

And then tried each of these four versions of the offending method:

try { } finally {   var doesNothing = new DoesNothing();   doesNothing.FooNonBox(reportCount); } 

Which works.

try { } finally {   var doesNothing = new DoesNothing();   doesNothing.NonVirtualFooNonBox(reportCount); } 

Which also works.

try { } finally {   var doesNothing = new DoesNothing();   doesNothing.FooBox(reportCount); } 

Oops - StackOverflowException.

And:

try { } finally {   var doesNothing = new DoesNothing();   doesNothing.NonVirtualFooBox(reportCount); } 

Oops again! StackOverflowException!

We could go further with this - but the issue is, I feel, clearly caused by the boxing of the volatile int whilst inside the finally block of a try/catch... I put the code inside the try, and no problem. I added a catch clause (and put the code in there), also no problem.

It could also apply to the boxing of other value types I guess.

So, to summarise - in .Net 4.0 - in both debug and release builds - the boxing of a volatile int in a finally block appears to cause the JIT to generate code that ends up filling the stack. The fact that the stack trace simply shows 'external code' also supports this proposition.

There's even a possibility that it can't always be reproduced and might even depend on the layout and size of the code that is generated by the try/finally. It's clearly something to do with an errant jmp or something similar being generated to the wrong location which eventually repeats one or more push commands to the stack. The idea that that is being caused actually by a box operation is, frankly, fascinating!

Final Final Update

If you look at the MS Connect bug that @Hasty G found (answer further down) - you see there that the bug manifests in a similar fashion, but with a volatile bool in a catch statement.

Also - MS queued a fix for this after getting it to repro - but no hotfix available yet after 7 months. I've gone on record before as being in support of MS Connect, so I'll say no more - I don't think I need to!

Final Final Final Update (23/02/2011)

It is fixed - but not yet released. Quote from the MS Team on the MS Connect bug:

Yes, it's fixed. We're in the process of figuring out how best to ship a fix. It is already fixed in 4.5, but we'd really like to fix a batch of code generation bugs prior to 4.5 release.

like image 106
Andras Zoltan Avatar answered Sep 21 '22 02:09

Andras Zoltan


The bug is in your code. Presumably, MyButton_Click_Aux() causes some method to be re-entered. However, you've inexplicably omitted that code from your question and so no one can comment on it.

like image 44
Jonathan Wood Avatar answered Sep 22 '22 02:09

Jonathan Wood