I have a .NET 2.0 console application running on a Windows Server GoDaddy VPS in the Visual Studio 2010 IDE in debug mode (F5).
The application periodically freezes (as if the garbage collector has temporarily suspended execution) however on the rare occasion it never resumes execution!
I've been diagonosing this for months, and am running out of ideas.
Anyone have any tips for diagnosing what exactly is happening?
Hunting to know BLOCKING vs SUSPENDINGA process is blocked when there is some external reason that it can not be restarted, e.g., an I/O device is unavailable, or a semaphore file is locked. A process is suspended means that the OS has stopped executing it, but that could just be for time-slicing (multitasking).
Whenever the processes in main memory are entered into the blocked state, the operating system suspends one process by putting it in the Suspended state and transferring it to disk. The free space present in the memory is used for bringing another process.
It is also multi-threaded
That's the key part of the problem. You are describing a very typical way in which a multi-threaded program can misbehave. It is suffering from deadlock, one of the typical problems with threading.
It can be narrowed down a bit further from the info, clearly your process isn't completely frozen since it still consumes 100% cpu. You probably have a hot wait-loop in your code, a loop that spins on another thread signaling an event. Which is likely to induce an especially nasty variety of deadlock, a live-lock. Live-locks are very sensitive to timing, minor changes in the order in which code runs can bump it into a live-lock. And back out again.
Live-locks are extraordinarily difficult to debug since attempting to do so makes the condition disappear. Like attaching a debugger or breaking the code, enough to alter the thread timing and bump it out of the condition. Or adding logging statements to your code, a common strategy to debug threading problems. Which alters the timing due to the logging overhead which in turn can make the live-lock entirely disappear.
Nasty stuff and impossible to get help with such a problem from a site like SO since it is extremely dependent on the code. A thorough review of the code is often required to find the reason. And not infrequently a drastic rewrite. Good luck with it.
Does the application have "dead lock recover/prevention" code? That is, locking with timout, then trying again, perhaps after sleep?
Does the application check error codes (return values or exceptions) and repeatedly retry in case of error anywhere?
Note that such looping can also happen through event loop, where your code is only in some event handler. It does not have to be an actual loop in your own code. Though this is probably not the case, if application is frozen, indicating blocked event loop.
If you have anything like above, you could try to mitigate the problem by making timeouts and sleeps to be of random interval, as well as adding short random-duration sleeps to cases where error might produce dead-/livelock. If such a loop is performance-sensitive, add a counter and only start sleeping with random, perhaps increasing interval after some number of failed retries. And make sure any sleep you add does not sleep while something is locked.
If the situation would happen more often, you could also use this to bisect your code and pinpoint which loops (because 100% CPU usage means, some very busy loops are spinning) are responsible. But from the rarity of issue, I gather you're going to be happy if the problem just goes away in practice ;)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With