As the load on our Azure website has increased (along with the complexity of the work that it's doing), we've noticed that we're running into CPU utilization issues. CPU utilization gradually creeps upward over the course of several hours, even as traffic levels remain fairly steady. Over time, if the Azure stats are correct, we'll somehow manage to get > 60 seconds of CPU per instance (not quite clear how that works), and response times will start increasing dramatically.
If I restart the web server, CPU drops immediately, and then begins a slow creep back up. For instance, in the image below, you can see CPU creeping up, followed by the restart (with the red circle) and then a recovery of the CPU.
I'm strongly inclined to suspect that this is a problem somewhere in my own code, but I'm scratching my head as to how to figure it out. So far, any attempts to reproduce this on my dev or testing environments have proven ineffectual. Nearly all the suggestions for profiling IIS/C# performance seem to presume either direct access to the machine in question or at least a "Cloud Service" instance rather than an Azure Website.
I know this is a bit of a long shot, but... any suggestions, either for what it might be, or how to troubleshoot it?
(We're using C# 5.0, .NET 4.5.1, ASP.NET MVC 5.2.0, WebAPI 2.2, EF 6.1.1, Azure System Bus, Azure SQL Database, Azure redis cache, and async for every significant code path.)
Edit 8/5/14 - I've tried some of the suggestions below. But when the website is truly busy, i.e., ~100% CPU utilization, any attempt to download a mini-dump or GC dump results in a 500 error, with the message, "Not enough storage." The various times that I have been able to download a mini-dump or GC dump, they haven't shown anything particularly interesting, at least, so far as I could figure out. (For instance, the most interesting thing in the GC dump was a half dozen or so >100KB string instances - those seem to be associated with the bundling subsystem in some fashion, so I suspect that they're just cached ScriptBundle
or StyleBundle
instances.)
Go to specific azure resource such as web app and than click on Diagnose and solve problem -> Availability and Performance -> CPU Usage.
CPU (Short) The amount of CPU allowed for this app in a 5-minute interval. This quota resets every five minutes.
https://{sitename}.scm.azurewebsites.net/ProcessExplorer/
there you can take memory dumps an GC dumps of your w3wp process.
Then you can compare 2 GC dumps to find memory leaks and open the memory dump with windbg/VS for further "offline" debugging.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With