Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to handle that Azure App Service doesn't recover from "Http Error 500.37 - ANCM Failed to Start Within Startup Time Limit"

Http Error 500.37 - ANCM Failed to Start Within Startup Time Limit

We are seeing this error on our Azure App Services running with .NET Core 3.1. It looks like when Azure updates the server farm, our instances get restarted and it tries to restart all app services at the same time. We do have a lot of services running on 1 instance, because it is a DEV/QA instance. The instance has enough resources for normal operation, but it looks when everything is restarted at the same time it takes more time.

The problem is that the app service doesn't recover from this, so our services only start working again when we restart the app manually.

Here they mention the error: https://learn.microsoft.com/en-us/aspnet/core/test/troubleshoot-azure-iis?view=aspnetcore-3.1#:~:text=500.37%20ANCM%20Failed%20to%20Start%20Within%20Startup%20Time%20Limit&text=By%20default%2C%20the%20timeout%20is,startup%20process%20of%20multiple%20apps.

But guidance here is to "stagger the startup process of multiple apps.", but on an update of the service farm I don't think we have that ability correct? That seems to be confirmed here: https://twitter.com/martincetkovsky/status/1231160330488774657?lang=en

Based on this: https://learn.microsoft.com/en-us/aspnet/core/host-and-deploy/aspnet-core-module?view=aspnetcore-3.1#attributes-of-the-aspnetcore-element

startupTimeLimit
Duration in seconds that the module waits for the executable to start a process listening on the port. If this time limit is exceeded, the module kills the process. The module attempts to relaunch the process when it receives a new request and continues to attempt to restart the process on subsequent incoming requests unless the app fails to start rapidFailsPerMinute number of times in the last rolling minute.

This implicates the app would retry, at least after 1 minute, but that doesn't seem to be case for us. Could this be incorrectly configuration on our end?

I would be ok with getting some of these errors after an update (it is DEV/QA after all), but if it doesn't recover, that is a problem. In prod we shouldn't see this, because we have more resources available, but also there auto recovery is important.

How can I make sure our services don't get stuck in this state? Other than having way too oversized server farms (with the associated cost)?

like image 974
Erik Steinebach Avatar asked Jun 29 '20 20:06

Erik Steinebach


People also ask

Why am I getting HTTP error 500 when deploying to Azure App service?

If you’re using ASP.NET Core 3.1.1 and are seeing HTTP Error 500 when deploying your application into Azure App Service, there’s a high change that the issue is caused by a known issue: If your project has a package reference that transtively references certain assemblies in the Microsoft.AspNetCore.App shared framework

How do I restart an app service in azure?

Using a separate browser, open your bot in the Azure Portal. Open the App Service Settings / All App service settings page to see all service settings. Switch to the Overview page for the app service and click Restart . It will prompt if you are sure; select yes. Return to the first browser window and watch the logs.

How do I recover an app from a failed Azure Region?

Create a new App Service app in a different Azure region than the impacted app. This is the target app in the disaster recovery scenario. In the Azure portal, navigate to the impacted app's management page. In a failed Azure region, the impacted app shows a warning text. Click the warning text.

What is HTTP error 500 30-ancm in-process start failure?

The “HTTP Error 500.30 – ANCM In-Process Start Failure” error most often occurs when the.NET Core web application fails to start up. This means that the troubleshooting step suggestion the error message gives you will not work. If the application can not start up, then you can not attach a debugger to it.


Video Answer


1 Answers

Based on recommendation of Microsoft, I went ahead and setup AutoHeal on our web apps.

This is the ARM template excerpt I am using:

    "autoHealEnabled": true,
    "autoHealRules": {
      "triggers": {
        "privateBytesInKB": 0,
        "statusCodes": [
          {
            "status": 500,
            "subStatus": 37, //Startup time limit 120000 in DEV and QA
            "win32Status": 0,
            "count": 1,
            "timeInterval": "00:01:00"
          }
        ]
      },
      "actions": {
        "actionType": "Recycle",
        "minProcessExecutionTime": "00:00:00"
      }
    }

The deployment of this change is still ongoing in our environment, so I haven't fully verified this solves the issue totally, but is seems promising.

like image 196
Erik Steinebach Avatar answered Oct 16 '22 08:10

Erik Steinebach