I have a rather high-load deployment on Azure: 4 Large instances serving about 300-600 requests per second. Under normal conditions: "Average Response Time" is 70 to 150ms, but sometimes it may grow up to 200-300ms, but it's absolutely OK.
Though, one or two times per day (not at "Rush Hours") I see such picture on the Web Site Monitoring tab:
So, number of requests per minute significantly drops, average response time is growing on to 3 minutes, and after a while – everything comes back to normal.
During this "Blackout" there is only 0.1% requests being dropped (Http Server Errors with timeout), other requests just wait in queue and are normally processed after few minutes. Though, not all clients are ready to wait :-(
Memory usage is under 30% all the time, CPU usage is only up to 40-50%.
What I've already checked?:
What could be the reason for such problems? What may I check next?
Thank you all in advance!
Update 1: BenV proposed good thing to try, but unfortunately it showed nothing :-(
I configured processes recycling every 500k requests and also added worker nodes, so CPU utilization is now less than 40% all day long, but blackouts still appear.
Update 2: Project uses ASP.Net MVC 4.
Use AutoHeal. AutoHeal recycles the worker process for your app based on settings you choose (like configuration changes, requests, memory-based limits, or the time needed to execute a request).
This timeout is not configurable, and this cannot be changed. Note that the idle timeout is at the TCP level which means that if the connection is idle only and no data transfer happening, then this timeout is hit.
By default each Cloud Run container instance can receive up to 80 requests at the same time; you can increase this to a maximum of 1000. Although you should use the default value, if needed you can lower the maximum concurrency.
I had this exact same problem. For me I saw a lot of WinCache errors in my logs.
Whenever the site would fail, it would have a lot of WinCache errors in the log. WinCache is how IIS handles PHP to try to speed up the processing. It’s a Microsoft built add-on that is enabled by default in IIS and all Azure sites. WinCache would get hung up and instead of recycling and continuing, it would consume all the memory and file handles on an instance, essentially locking it up.
I added new App setting in the Azure Portal to scan a folder for php.ini settings changes.
d:\home\site\ini
Added a file in d:\home\site\ini\settings.ini that contains the following
wincache.fcenabled=1
session.save_handler = files
memory_limit = 256M
wincache.chkinterval=5
wincache.ucachesize=200
wincache.scachesize=64
wincache.enablecli=1
wincache.ocenabled=0
wincache.fcenabled=1
Enables file caching using WinCache (I think that's the default anyway)
session.save_handler = files
Changes the session handler from WinCache (Azure Default) to standard file based to reduce the cache engine stress
memory_limit = 256M
wincache.chkinterval=5
wincache.ucachesize=200
wincache.scachesize=64
wincache.enablecli=1
Sets the WinCache size to 256 megabytes per thread and limits the overall Cache size. This forces WinCache to clear out old data and recycle the cache more often.
wincache.ocenabled=0
This is the big one. DISABLE WinCache Operational Code caching. That is WinCache caching the actual PHP scripts into memory. Files are still cached from line one, but PHP is interpreted per normal and not cached into large binary files.
I went from having a my Azure Website crash about once every 3 days with logs that look like yours to 120 days straight so far without any issues.
Good luck!
There's some nice tools available for Web Apps in the preview portal.
The Application Insights extension especially can be useful for monitoring and troubleshooting app performance.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With