AWS AutoScalingGroup HealthCheckType 'ELB' considers instance "InService" prematurely

Tags:

I'm trying to get AutoScalingRollingUpdate to work on my autoscaling group, by bringing online new instances, then only once the new instance(s) are accepting traffic, terminating the old instances. It seems like AutoScalingRollingUpdate is designed for this purpose.

I have the HealthCheckType of my AutoScalingGroup set to 'ELB'. I also have the HealthCheck on the ELB set to require:

3 successful requests to / for "healthy"
10 unsuccessful requests to / for "unhealthy"
no grace period (zero, 0)

Now, from the ELB's perspective, when new instances come online, they are not InService for several minutes, which is what I expect. However, from the AutoScalingGroup's perspective, they are almost immediately being considered InService, and as such, my AutoScalingGroup is taking healthy instances out of service before the new instances are actually ready to receive traffic. I'm confused why the ASG thinks the instances are healthy before the ELB does, when the HealthCheckType is explicitly set to 'ELB'.

I've tried setting a grace period, but this doesn't change anything at all. In fact, I removed the grace period of 300 seconds because I thought maybe instances were implicitly "InService" during the grace period or something.

I know I can set a PauseTime on the rolling update policy, but that is fragile, because sometimes failures happen when instances come online and they get nuked and replaced before they ever finish provisioning, so sometimes, the PauseTime window may be exceeded. Also, I'd like to minimize the amount of time my app is running two different versions at the same time.

    ... ELB stuff ...

    "HealthCheck": {
      "HealthyThreshold": "3",
      "UnhealthyThreshold": "10",
      "Interval": "30",
      "Timeout": "15",
      "Target": {
        "Fn::Join": [
          "",
          [
            {"Fn::Join": [":", ["HTTP", {"Ref": "hostPort"}]]},
            {"Ref": "healthCheckPath"}
          ]
        ]
      }
    },

   ... ASG Stuff ...

  {
    ... snip ...

    "HealthCheckType": "ELB",
    "HealthCheckGracePeriod": "0",
    "Cooldown": "300"
  },
  "UpdatePolicy" : {
    "AutoScalingRollingUpdate" : {
      "MinInstancesInService" : "1",
      "MaxBatchSize" : "1"
    }
  }

844

asked Nov 25 '14 07:11

d11wtq

1 Answers

First, from our experience with CloudFormation the ASG HealthCheckType and HealthCheckGracePeriod are leveraged primarily outside the scope of CloudFormation events. These properties come into play anytime a new instance is added to the ASG. This can be during a CloudFormation update, but also during Auto Scaling events or during a self-healing event. In the latter cases it is important to set the HealthCheckGracePeriod to a value that gives the new instance sufficient time to come online before considering the ELB health checks.

It seems the capability you are most interested in is the UpdatePolicy that is invoked when you run a CloudFormation update with a modified Launch Configuration. The magic property is the WaitOnResourceSignals which forces the ASG to wait for a success signal before considering the update a success.

  "UpdatePolicy" : {
    "AutoScalingRollingUpdate" : {
      "MinInstancesInService" : "1",
      "MaxBatchSize" : "1",
      "PauseTime" : "PT15M",
      "WaitOnResourceSignals" : "true"
    }
  },

When the WaitOnResourceSignals property is set to true, the PauseTime property becomes a timeout. If the ASG does not receive a signal within the PauseTime of 15 minutes, the update is considered a failure and the new instance is terminated. As soon as the ASG receives a success signal, the ASG health check comes into play, unless the HealthCheckGracePeriod has not yet expired. We typically set the HealthCheckGracePeriod to the same value as the PauseTime. This ensures that we never begin using the ELB health check before the instance has had a chance to send a signal or reach the PauseTime timeout. http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-attribute-updatepolicy.html

Typically, a success signal is sent to the ASG following the cfn-init bootstrapping script from within the UserData of the ASG Launch Configuration.

"UserData"       : { "Fn::Base64" : { "Fn::Join" : ["", [
     "#!/bin/bash -xe\n",
     "yum update -y aws-cfn-bootstrap\n",

     "/opt/aws/bin/cfn-init -v ",
     "         --stack ", { "Ref" : "AWS::StackName" },
     "         --resource LaunchConfig ",
     "         --configsets full_install ",
     "         --region ", { "Ref" : "AWS::Region" }, "\n",

     "/opt/aws/bin/cfn-signal -e $? ",
     "         --stack ", { "Ref" : "AWS::StackName" },
     "         --resource WebServerGroup ",
     "         --region ", { "Ref" : "AWS::Region" }, "\n"
]]}}

This is sufficient for many cases, but sometimes the instance may still not be ready when we send the success signal back to the ASG. For example, we may want to wait on a background process to load data or wait for our application server to start. This is especially true if our ELB health check targets a URL that requires our application to be running. In these cases we want to delay the success signal until our instance is ready. Here is an example of how to create a Launch Configuration configSet to delay the signal until the ELB API returns an "InService" status for the instance.

  "verify_instance_health" : {
    "commands" : {
      "ELBHealthCheck" : {
        "command" : { "Fn::Join" : ["", [ 
          "until [ \"$state\" == \"\\\"InService\\\"\" ]; do ",
          "  state=$(aws --region ", { "Ref" : "AWS::Region" }, " elb describe-instance-health ",
          "              --load-balancer-name ", { "Ref" : "ElasticLoadBalancer" }, 
          "              --instances $(curl -s http://169.254.169.254/latest/meta-data/instance-id) ",
          "              --query InstanceStates[0].State); ",
          "  sleep 10; ",
          "done"
        ]]}
      }
    }
  }

See this discussion forum for more information and a complete example using the ELB health check - https://forums.aws.amazon.com/ann.jspa?annID=2741

Note: These examples also require that you use the ASG CreationPolicy attribute to receive the signals during ASG creation. In the past, the WaitCondition and WaitConditionHandle resources were used to receive signals, but these are no longer recommended. The Count attribute is the number of signals that should be received at creation. This value should equal the ASG MinSize number.

  "CreationPolicy" : {
    "ResourceSignal" : {
      "Timeout" : "PT15M",
      "Count"   : "2"
    }
  },

http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-attribute-creationpolicy.html

152

answered Oct 18 '22 16:10

Jason

Related questions
                            
                                Amazon CLI, route 53, TXT error
                            
                                An error occurred (InvalidClientTokenId) when calling the AssumeRole operation: The security token included in the request is invalid
                            
                                Missing handler error in AWS Lambda
                            
                                How do I destroy a aws SAM Local lambda?
                            
                                How can I use axios in lambda?
                            
                                Could not resolve substitution to a value: ${akka.stream.materializer} in AWS Lambda
                            
                                How to use spot instance with amazon elastic beanstalk?
                            
                                How can I create a DependsOn relation between EC2 and RDS using aws-cdk
                            
                                Speed from Different EC2 Regions
                            
                                Amazon Web Services S3 Request Limit
                            
                                "ConnectionPoolTimeoutException" when iterating objects in S3
                            
                                Can I automate an application deployment via webhook to OpsWorks?
                            
                                EC2/Route53: How Do I Point Apex Record at Load Balancer?
                            
                                Shorthand syntax for message-attributes in the send-message command in aws-cli for sqs
                            
                                How to find the list of databases within an AWS Redshift cluster
                            
                                Any way to toggle between local desktop and AWS Workspace in full screen view using code or a script? [closed]
                            
                                Unit testing AWS: step function
                            
                                What is the difference between S3.Client.upload_file() and S3.Client.upload_fileobj()?
                            
                                Migration details for DynamoDB v2 in AWS Java SDK?
                            
                                AMAZON AWS How do i subscribe an endpoint to SNS topic?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

AWS AutoScalingGroup HealthCheckType 'ELB' considers instance "InService" prematurely

Tags:

amazon-web-services

amazon-cloudformation

amazon-elb

autoscaling

d11wtq

People also ask

1 Answers

Jason

Recent Activity

Donate For Us