I'm currently investigating a bug that causes a windows to freeze up. After the bug happens all process that are currently running will continue to run, but if you try to use them they will eventually freeze up.
For example I have a task manager and a couple of cmds open at the moment of freeze up. Task manager works nicely, displays processor/memory usage, list of all processes etc. But if I try to kill a process it would freeze up. If i tried to open File -> New Task it would freeze up. In cmd if i tried to open a windows application, the command would execute and the new process would appear in task manager but the application would not start up. Even starting a command line application would freeze up.
The software in question is a set of 12 various service applications that communicate with each other using WCF. Most is written in C#, there is some Fortran, C++. All of this is running user space, we have nothing executing in kernel space.
So my question is has anyone seen this or similar behavior? What were the causes? In theory nothing a user space application does should freeze the whole OS?. Any tips on debugging this situation would also be helpful. Thank you for your time.
We've tried writing a small application that constantly writes/reads (with random seeks and opening/closing of file) from disk and started before the system freezes. The application kept on successfully writing/reading opening and closing files after the freeze. The memory usage is same as in normal use, between 4 and 5 GB the system has 6GB.
We also did a memory dump the trouble is that we failed to figure out what is happening. The dump of course shows that windows has frozen in keyboard driver, but besides that we couldn't figure much out. It would be much more useful if we could do user space memory dump. Ok this sentence made me Google a bit, it appears there is a complete memory dump option, will research this some more and update on progress.
Our current suspect is NOD32 Firewall, when it's off everything appears to be working ok. We still need to confirm this and find out what in our code is provoking this behavior.
Thanks everybody for your assistance.
Ok I've managed to create full memory dump. It wasn't as easy as I hoped, here are some useful resources maybe they will help someone someday.. :
http://www.osronline.com/article.cfm?article=545
http://blogs.msdn.com/b/ntdebugging/archive/2010/04/02/how-to-use-the-dedicateddumpfile-registry-value-to-overcome-space-limitations-on-the-system-drive-when-capturing-a-system-memory-dump.aspx
Once system froze, I started one cmd.exe and initiated copy command, the cmd froze, and here is it's stack trace:
fffff880`087571d0 fffff800`02cc2992 nt!KiSwapContext+0x7a
fffff880`08757310 fffff800`02cc4d0f nt!KiCommitThreadWait+0x1d2
fffff880`087573a0 fffff800`02cd9d1f nt!KeWaitForSingleObject+0x19f
fffff880`08757440 fffff800`02fc06d6 nt!AlpcpSignalAndWait+0x8f
fffff880`087574f0 fffff800`02fbe660 nt!AlpcpReceiveSynchronousReply+0x46
fffff880`08757550 fffff800`02fcd13d nt!AlpcpProcessSynchronousRequest+0x33d
fffff880`08757670 fffff800`030ade59 nt!LpcpRequestWaitReplyPort+0x9c
fffff880`087576d0 fffff880`05ad1344 nt!LpcRequestWaitReplyPort+0x19
fffff880`08757710 fffff880`05ad430f eamon+0x5344
fffff880`087578d0 fffff880`05ad25bb eamon+0x830f
fffff880`08757970 fffff800`02fd075f eamon+0x65bb
fffff880`087579f0 fffff800`02fb6624 nt!IopCloseFile+0x11f
fffff880`08757a80 fffff800`02fd0251 nt!ObpDecrementHandleCount+0xb4
fffff880`08757b00 fffff800`02fd0164 nt!ObpCloseHandleTableEntry+0xb1
fffff880`08757b90 fffff800`02cba953 nt!ObpCloseHandle+0x94
fffff880`08757be0 00000000`77bff7aa nt!KiSystemServiceCopyEnd+0x13 (TrapFrame @ fffff880`08757be0)
00000000`002fd848 00000000`00000000 ntdll!ZwClose+0xa
After some extensive testing we have concluded that issue is related to ESET NOD32 Antivirus. Thank you all for your help and information provided.
Not sure about possible reasons, but in my opinion the best way to proceed would be to generate a complete memory dump during the freeze and look for clues there using WinDbg. See here how to create BSOD using the keyboard.
Since the freeze is caused by unfinished I/O the problem is likely is some driver.
I would start with !analyze in the kernel memory dump you collected. In many cases it can pinpoint the problem right away, or give a clue on where to look for.
If !analyze does not report anything useful, then you can try obtaining more information on what driver is involved in the freeze by looking into IRP of the stalled thread.
!thread
output.!irp
command.The IRP might look something like this:
kd> !irp 8420d320
Irp is active with 4 stacks 2 is current (= 0x8420d3b4)
No Mdl: No System Buffer: Thread 8426b420: Irp stack trace.
cmd flg cl Device File Completion-Context
[ 0, 0] 0 0 00000000 00000000 00000000-00000000
Args: 00000000 00000000 00000000 00000000
>[ 0, 0] 0 e0 84d1e878 84275bd0 86be23be-83bb1908 Success Error Cancel
\FileSystem\mrxsmb mup!MupiUncProviderCompletion
Args: 88795aac 01220044 00050000 00000000
[ 0, 0] 0 e0 84723700 84275bd0 863bf4de-84bc9228 Success Error Cancel
\FileSystem\Mup fltmgr!FltpSynchronizedOperationCompletion
Args: 88795aac 01220044 00050000 00000000
[ 0, 0] 0 0 847231f0 84275bd0 00000000-00000000
\FileSystem\FltMgr
Args: 88795aac 01220044 00050000 00000000
The active stack in the IPR is marked with >
. In the example above it is waiting for \FileSystem\mrxsmb device.
To make investigation easier, take time to configure kernel debugger for the machine. This is optional, but it makes it easier than handling memory dumps.
From the stack dump, the "eamon.sys" driver seems to be in the middle of the battle. Like you said, this driver is related to ESET's NOD32 Antivirus.
If you add to this the fact you say everything is working fine without it, then you should stop your research here. Antivirus software packages are by definition installed as drivers, so they can do their work efficiently. The downside of this is when they have problems, it means they can easily hog a machine completely or cause BSODs.
Googling a bit, there are some others similar reports about this particular software (http://www.wilderssecurity.com/archive/index.php/t-259245.html).
You should contact the vendor and see if it's normal or if they have an update or a way to fix this.
Sounds like waiting for disk I/O requests that don't complete in a timely fashion.
Either you have an I/O error (bus disconnect, things like that) or an excessively long queue of disk requests causing a lot of seeking. Pagefile usage can cause this.
What does Task Manager show for memory utilization?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With