For the last couple days we have seen intermittent crashes of the w3wp.exe worker process serving the main application pool for our corporate web site. Sometimes the crashes are isolated, and IIS is able to restart the worker process successfully. But if more than 5 crashes happen in 5 minutes, IIS Rapid Fail Protection kicks in and stops the application pool. Here is an example entry from the Application event log just before the crash:
An unhandled exception occurred and the process was terminated.
Application ID: /LM/W3SVC/2/ROOT
Process ID: 3640
Exception: System.Threading.ThreadAbortException
Message: Thread was being aborted.
StackTrace: at System.Web.HttpRuntime.ProcessRequestNotificationPrivate(IIS7WorkerRequest wr, HttpContext context)
at System.Web.Hosting.PipelineRuntime.ProcessRequestNotificationHelper(IntPtr rootedObjectsPointer, IntPtr nativeRequestContext, IntPtr moduleData, Int32 flags)
at System.Web.Hosting.PipelineRuntime.ProcessRequestNotification(IntPtr rootedObjectsPointer, IntPtr nativeRequestContext, IntPtr moduleData, Int32 flags)
Immediately after the crash due to the ThreadAbortException, there is a more serious event logged:
Faulting application name: w3wp.exe, version: 8.0.9200.16384, time stamp: 0x5010885f
Faulting module name: KERNELBASE.dll, version: 6.2.9200.17366, time stamp: 0x554d16f6
Exception code: 0xe0434352
Fault offset: 0x00010192
Faulting process id: 0xe38
Faulting application start time: 0x01d100dc662652d6
Faulting application path: C:\Windows\SysWOW64\inetsrv\w3wp.exe
Faulting module path: C:\Windows\SYSTEM32\KERNELBASE.dll
Report Id: db5b0d5b-6cd0-11e5-9418-005056900458
Faulting package full name:
Faulting package-relative application ID:
Now, a ThreadAbortException should never cause w3wp.exe to crash, seeing as it is thrown every time a standard Response.Redirect() is performed. MSDN confirms this, and I also confirmed it with a simple test. However, at least one other person has seen a similar crash recently with a similar environment: Thread.Abort in ASP.NET app causes w3wp.exe to crash. (But that may be an unrelated issue.)
Our environment:
Background:
A couple days prior to the start of crashes, we upgraded to .NET 4.6. We have the new RyuJIT enabled (the default setting) and we have installed all updates to address the critical compiler issue described here: http://blogs.msdn.com/b/dotnet/archive/2015/07/28/ryujit-bug-advisory-in-the-net-framework-4-6.aspx.
We had also deployed a new version of our web code (as we do several times per week). Naturally we double-checked the code changes for any potential crash vulnerabilities, but none of our changes seem vulnerable to infinite loops, recursive stack overflows, or high memory usage -- the normal culprits when w3wp.exe crashes with an unhandled exception.
Sometimes the crash affects one web server within minutes of another, but other times only one web server is affected.
Things I've tried:
> 0:026> !clrstack > OS Thread Id: 0x1ff0 (26) > Child SP IP Call Site > 2321f96c 771bdf8c [GCFrame: 2321f96c] > 2321f9a4 771bdf8c [GCFrame: 2321f9a4]
Any ideas?
Update:
We have rolled back .NET 4.6 and recent Windows Updates on our web servers. We have been monitoring this for either 2 or 3 days, depending on when the server was rolled back, and in each case, there have been zero subsequent crashes, despite maintaining the same application code. This pretty definitively proves that either .NET 4.6 or the other Windows Updates caused the intermittent crashing, not our code, because w3wp.exe was previously crashing several times per day.
We are now trying to prove this to Microsoft Support, but it's an uphill battle because the issue was random, intermittent, and we could not reproduce it reliably. (They have provided a dump analysis but it seems to be a red herring.) We are also in the process of reapplying the updates in groups and waiting several days to watch for crashes, in an effort to isolate the faulty update. Obviously this is a tedious process.
Update #2:
We've now re-applied all the pre-.NET 4.6 Windows Updates that were removed in troubleshooting, and the servers have been running for several days without crashes. The only things left to re-apply are .NET 4.6 and its own updates, but my management is understandably reluctant to install things that will likely cause crashes in production. So I'm continuing to work with MS to analyze different crash dumps to pinpoint the problem.
You didn't show any code, but the evidence suggests this is an issue with your application code, and not with .NET 4.6 or with ThreadAbortException
specifically.
Basic troubleshooting steps here: you said there were code changes AND environment changes; so rule one of them out.
Test app on a VM with .NET 4.5 installed. If you do not get error, .NET 4.6 may be the cause.
Test older version of your app on same server. If no issue noticed, code change is likely cause.
Test app on machine with VS.NET installed, attach to the w3wp.exe
process, and debug it (Tools > Attach to Process). Catch the ThreadAbortException
and trace through it.
If you don't already, you should be logging the event that your w3wp.exe
process stops.. though this obviously will not handle all exceptions. Google this, but this guy describes a solution that I also use
If you don't already, define an Application_Error
handler in Global
to log exceptions. Microsoft demonstrates this. Create a System.Web.Configuration
option that you can toggle in your web.config
file to enable different levels of logging, including writing to a local file, and writing to the windows event logs, for example. You can also install a logging handler tool like Elmah.
Create a barebones simple web app and test Response.Redirect
to verify whether it crashes the w3wp.exe
(worker process) with .NET 4.6. I did this, and it didn't, so I suspect your code. Or possible weird server/patch level emergent issue.. these steps should help you pinpoint it.
Side note: Even though it shouldn't affect the app process, I recommend fixing the Response.Redirect()
issues. We did this recently in an Enterprise app, and yes it was a change of wide scope, but we no longer get the TAE exceptions. The fix is simple: just call Response.Redirect(false);
and then make sure that there is no code that will run after that function is called (call return
for example). This post explains
@Jordan Rieger, this bug should be fixed in .NET 4.6.1 Can you please confirm whether the problem is fixed in the new framework? Or if it still persists? Thanks.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With