I have an App Service on the P1V2 pricing tier that I use as the backend for my mobile app. I don't usually have many users but a couple of months ago there was a peak of users that made the app service unusable hours at a time.
From the metrics, I can see that CPU and memory-wise the app service is fine but when the problem happens we do see the thread count going higher and higher. It seems every request eats up another thread but none of the threads are freed and so no requests are completed during the time. When that happens if we reset the app service the thread count goes down momentarily but then explodes again. The only mitigation I have right now is to scale out the service when this happens which takes a couple of minutes and will cost me a lot of money and effort.
I have played around with setting the minimum and maximum threads at the thread pool and also limiting the number of max concurrent requests per CPU but nothing has helped. I can reproduce the problem with just 5 of the most commonly used APIs. All of them make asynchronous calls to a SQL database also hosted on azure. I use async/await and context is disposed after the call. I use entity framework as our ORM.
The app service plan I pay for should have been able to handle the load easily and as long as there is no sudden peak of requests it does without a problem. But when the service goes down it can stay down for hours at a time and restarting or stopping the service doesn't help at all. We have reverted the backend to older versions and the problem still shows.
I can reproduce the problem easily by just blasting the backend with requests. Beneath you can find an example of what happens. One thing that points out at us is that no matter how many requests we send never have we seen the Http queue length go up.
To eliminate ThreadPool starvation, ThreadPool threads need to remain unblocked so that they're available to handle incoming work items. There are two ways to determine what each thread was doing, either using the dotnet-stack tool or capturing a dump with dotnet-dump that can be viewed in Visual Studio.
As per the doc Understand metrics: Thread Count --> The number of threads currently active in the app process .
Starvation describes a situation where a thread is unable to gain regular access to shared resources and is unable to make progress. This happens when shared resources are made unavailable for long periods by "greedy" threads.
I'm facing the same exact problem and we don't have async/await access on DB. The raise in thread usage is completely randomic.
Do you maybe use Redis cache and StackExchange? I'm putting my fiches on that (in my case).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With