Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Azure App Service Autoscale Fails to Scale In

My app service has failed to scale-in after scaling-out. This seems to be a pattern I've been trying to troubleshoot for several months.

I've tried the following but none have worked:

My scale condition is based on CPU and memory. However, I've never seen CPU go past 12%, so I'm assuming it's actually scaling based on memory.

  1. Set the scale out condition to memory over 90% over a 5 minute average with 10 min. cooldown and scale in condition for memory under 70% over a 5 minute average. This doesn't seem to make sense since if my memory utilization is already at 90%, I'm really having underlying memory leaks and should have already scaled out.

  2. Set the scale out condition to memory over 80% over a 60 minute average with 10 min. cooldown and scale in condition for memory under 60% over a 5 minute average. This makes more sense, as I've seen memory usage burst over a few hours only to drop.

enter image description here

Expected behavior: App service autoscaling will reduce instance count after 5 minutes where memory usage drops below 60%.

Question:

What is the ideal threshold on a metric to scale smoothly by if my baseline CPU remains roughly at an average of 6% and memory at 53%? Meaning, what is the best minimum values to scale in by and best max values to scale out without worrying about anti-patterns such as flapping? A larger threshold of 20% difference makes more sense to me.

Alternative solution:

Given the amount of troubleshooting involved with what's marketed as as simple as "push button scaling", makes it almost not even worth the headache of the configuration vagueness (you can't even use IIS metrics like connection count without a custom powershell script!). I'm considering disabling autoscaling because of its unpredictability and just keep 2 instances running for automatic load balancing and scale manually.

Autoscale Configuration:

{
    "location": "East US 2",
    "tags": {
        "$type": "Microsoft.WindowsAzure.Management.Common.Storage.CasePreservedDictionary, Microsoft.WindowsAzure.Management.Common.Storage"
    },
    "properties": {
        "name": "CPU and Memory Autoscale",
        "enabled": true,
        "targetResourceUri": "/redacted",
        "profiles": [
            {
                "name": "Auto created scale condition",
                "capacity": {
                    "minimum": "1",
                    "maximum": "10",
                    "default": "1"
                },
                "rules": [
                    {
                        "scaleAction": {
                            "direction": "Increase",
                            "type": "ChangeCount",
                            "value": "1",
                            "cooldown": "PT10M"
                        },
                        "metricTrigger": {
                            "metricName": "MemoryPercentage",
                            "metricNamespace": "",
                            "metricResourceUri": "/redacted",
                            "operator": "GreaterThanOrEqual",
                            "statistic": "Average",
                            "threshold": 80,
                            "timeAggregation": "Average",
                            "timeGrain": "PT1M",
                            "timeWindow": "PT1H"
                        }
                    },
                    {
                        "scaleAction": {
                            "direction": "Decrease",
                            "type": "ChangeCount",
                            "value": "1",
                            "cooldown": "PT5M"
                        },
                        "metricTrigger": {
                            "metricName": "MemoryPercentage",
                            "metricNamespace": "",
                            "metricResourceUri": "/redacted",
                            "operator": "LessThanOrEqual",
                            "statistic": "Average",
                            "threshold": 60,
                            "timeAggregation": "Average",
                            "timeGrain": "PT1M",
                            "timeWindow": "PT10M"
                        }
                    },
                    {
                        "scaleAction": {
                            "direction": "Increase",
                            "type": "ChangeCount",
                            "value": "1",
                            "cooldown": "PT5M"
                        },
                        "metricTrigger": {
                            "metricName": "CpuPercentage",
                            "metricNamespace": "",
                            "metricResourceUri": "/redacted",
                            "operator": "GreaterThanOrEqual",
                            "statistic": "Average",
                            "threshold": 60,
                            "timeAggregation": "Average",
                            "timeGrain": "PT1M",
                            "timeWindow": "PT1H"
                        }
                    },
                    {
                        "scaleAction": {
                            "direction": "Decrease",
                            "type": "ChangeCount",
                            "value": "1",
                            "cooldown": "PT5M"
                        },
                        "metricTrigger": {
                            "metricName": "CpuPercentage",
                            "metricNamespace": "",
                            "metricResourceUri": "/redacted",
                            "operator": "LessThanOrEqual",
                            "statistic": "Average",
                            "threshold": 40,
                            "timeAggregation": "Average",
                            "timeGrain": "PT1M",
                            "timeWindow": "PT10M"
                        }
                    }
                ]
            }
        ],
        "notifications": [
            {
                "operation": "Scale",
                "email": {
                    "sendToSubscriptionAdministrator": false,
                    "sendToSubscriptionCoAdministrators": false,
                    "customEmails": [
                        "redacted"
                    ]
                },
                "webhooks": []
            }
        ],
        "targetResourceLocation": "East US 2"
    },
    "id": "/redacted",
    "name": "CPU and Memory Autoscale",
    "type": "Microsoft.Insights/autoscaleSettings"
}
like image 670
Stuart Avatar asked Jan 28 '23 07:01

Stuart


1 Answers

For the CpuPercentage metric you have a SCALE UP action when it goes beyond 60 and a scale down action when it goes below 40 and the difference between the two is very less. This can cause a behavior described as Flapping and this will cause AutoScale's scale in action not to kick in. Similar issue is the MemoryPercent rule that you have configured.

You should have a difference of at-least 40 between your scale up and scale in threasholds to avoid flapping. More details on Flapping are in https://learn.microsoft.com/en-us/azure/monitoring-and-diagnostics/insights-autoscale-best-practices#choose-the-thresholds-carefully-for-all-metric-types (search for the word Flapping)

like image 126
Puneet Gupta Avatar answered Feb 01 '23 14:02

Puneet Gupta