CloudFormation AutoScalingGroup not waiting for signal on update/scale-up

Tags:

I'm working with a CloudFormation template that brings up as many instances as I request, and want to wait for them to finish initialising (via User Data) before the stack creation/update is considered complete.

The Expectation

Creating or updating the stack should wait for signals from all newly created instances, such to ensure that their initialisation is complete.

I don't want the stack creation or update to be considered successful if any of the created instances fail to initialise.

The Reality

CloudFormation only seems to wait for signals from instances when the stack is first created. Updating the stack and increasing the number of instances seems to disregard signalling. The update operation finishes successfully very quickly, whilst instances are still being initialised.

Instances created as a result of updating the stack can fail to initialise, but the update action would've already been considered a success.

The Question

Using CloudFormation, how can I make the reality meet the expectation?

I want the same behaviour that applies when the stack is created, to when the stack is updated.

Reproducing

To demonstrate the problem, I've created a template based off of the example beneath the Auto Scaling Group header on this AWS documentation page, which includes signalling.

The created template has been adapted as so:

It uses an Ubuntu AMI (in region ap-northeast-1). The cfn-signal command has been bootstrapped and called as necessary considering this change.
A new parameter dictates how many instances to launch in the auto scaling group.
A sleep time of 2 minutes has been added before signalling, to simulate the time spent whilst initialising.

Here's the template, saved to template.yml:

Parameters:
  DesiredCapacity:
    Type: Number
    Description: How many instances would you like in the Auto Scaling Group?

Resources:
  AutoScalingGroup:
    Type: AWS::AutoScaling::AutoScalingGroup
    Properties:
      AvailabilityZones: !GetAZs ''
      LaunchConfigurationName: !Ref LaunchConfig
      MinSize: !Ref DesiredCapacity
      MaxSize: !Ref DesiredCapacity
    CreationPolicy:
      ResourceSignal:
        Count: !Ref DesiredCapacity
        Timeout: PT5M
    UpdatePolicy:
      AutoScalingScheduledAction:
        IgnoreUnmodifiedGroupSizeProperties: true
      AutoScalingRollingUpdate:
        MinInstancesInService: 1
        MaxBatchSize: 2
        PauseTime: PT5M
        WaitOnResourceSignals: true

  LaunchConfig:
    Type: AWS::AutoScaling::LaunchConfiguration
    Properties:
      ImageId: ami-b7d829d6
      InstanceType: t2.micro
      UserData:
        'Fn::Base64':
          !Sub |
            #!/bin/bash -xe
            sleep 120

            apt-get -y install python-setuptools
            TMP=`mktemp -d`
            curl https://s3.amazonaws.com/cloudformation-examples/aws-cfn-bootstrap-latest.tar.gz | \
              tar xz -C $TMP --strip-components 1
            easy_install $TMP

            /usr/local/bin/cfn-signal -e $? \
              --stack ${AWS::StackName} \
              --resource AutoScalingGroup \
              --region ${AWS::Region}

Now I create the stack with a single instance, via:

$ aws cloudformation create-stack \
  --region=ap-northeast-1 \
  --stack-name=asg-test \
  --template-body=file://template.yml \
  --parameters ParameterKey=DesiredCapacity,ParameterValue=1

After waiting a few minutes for the creation to complete, let's look some key stack events:

$ aws cloudformation describe-stack-events \
  --region=ap-northeast-1 \
  --stack-name=asg-test

    ...
    {
        "Timestamp": "2017-02-03T05:36:45.445Z",
        ...
        "LogicalResourceId": "AutoScalingGroup",
        ...
        "ResourceStatus": "CREATE_COMPLETE",
        ...
    },
    {
        "Timestamp": "2017-02-03T05:36:42.487Z",
        ...
        "LogicalResourceId": "AutoScalingGroup",
        ...
        "ResourceStatusReason": "Received SUCCESS signal with UniqueId ...",
        "ResourceStatus": "CREATE_IN_PROGRESS"
    },
    {
        "Timestamp": "2017-02-03T05:33:33.274Z",
        ...
        "LogicalResourceId": "AutoScalingGroup",
        ...
        "ResourceStatusReason": "Resource creation Initiated",
        "ResourceStatus": "CREATE_IN_PROGRESS",
        ...
    }
    ...

You can see that the auto scaling group started initiating at 05:33:33. At 05:36:42 (3 minutes after initiation), it received a success signal. This allowed the auto scaling group to reach its own success status only moments after, at 05:36:45.

That's awesome - working like a charm.

Now let's try increasing the number of instances in this auto scaling group to 2 by updating the stack:

$ aws cloudformation update-stack \
  --region=ap-northeast-1 \
  --stack-name=asg-test \
  --template-body=file://template.yml \
  --parameters ParameterKey=DesiredCapacity,ParameterValue=2

After waiting a much shorter time for the update to complete, let's look at some of the new stack events:

$ aws cloudformation describe-stack-events \
  --region=ap-northeast-1 \
  --stack-name=asg-test

    {
        "ResourceStatus": "UPDATE_COMPLETE",
        ...
        "ResourceType": "AWS::CloudFormation::Stack",
        ...
        "Timestamp": "2017-02-03T05:45:47.063Z"
    },
    ...
    {
        "ResourceStatus": "UPDATE_COMPLETE",
        ...
        "LogicalResourceId": "AutoScalingGroup",
        "Timestamp": "2017-02-03T05:45:43.047Z"
    },
    {
        "ResourceStatus": "UPDATE_IN_PROGRESS",
        ...,
        "LogicalResourceId": "AutoScalingGroup",
        "Timestamp": "2017-02-03T05:44:20.845Z"
    },
    {
        "ResourceStatus": "UPDATE_IN_PROGRESS",
        ...
        "ResourceType": "AWS::CloudFormation::Stack",
        ...
        "Timestamp": "2017-02-03T05:44:15.671Z",
        "ResourceStatusReason": "User Initiated"
    },
    ....

Now you can see that whilst the auto scaling group started updating at 05:44:20, it completed at 05:45:43 - that's less than one and a half minutes to completion, which shouldn't be possible considering a sleep time of 120 seconds in the user data.

The stack update then proceeds to completion without the auto scaling group ever having received any signals.

The new instance does indeed exist.

In my real use case I've SSHed into one of these new instances to find that it was still in the process of initialising even after the stack update completed.

What I've Tried

I've read and re-read the documentation surrounding CreationPolicy and UpdatePolicy, but have failed to identify what I'm missing.

Taking a look at the update policy in use above, I don't understand what it's actually doing. Why is WaitOnResourceSignals true, but it's not waiting? Is it serving some other purpose?

Or are these new instances not falling under the "rolling update" policy? If they don't belong there, then I'd expect them to fall under the creation policy, but that doesn't seem to apply either.

As such, I don't really know what else to try.

I have a sneaking feeling that it's functioning as designed/expected, but if it is then what's the point of that WaitOnResourceSignals property and how can I meet the expectation set above?

377

asked Feb 03 '17 05:02

Bilal Akil

1 Answers

The AutoScalingRollingUpdate policy handles rotating out an entire set of instances in an Auto Scaling group in response to changes to the underlying LaunchConfiguration. It doesn't apply to individual changes to the number of instances in the existing group. According to the UpdatePolicy Attribute documentation,

The AutoScalingReplacingUpdate and AutoScalingRollingUpdate policies apply only when you do one or more of the following:

Change the Auto Scaling group's AWS::AutoScaling::LaunchConfiguration.

Change the Auto Scaling group's VPCZoneIdentifier property

Update an Auto Scaling group that contains instances that don't match the current LaunchConfiguration.

Changing the Auto Scaling group's DesiredCapacity property is not in this list, so the AutoScalingRollingUpdate policy does not apply to this type of change.

As far as I know, it is not possible (using standard AWS CloudFormation resources) to delay the completion of a Stack Update modifying DesiredCapacity until any new instances added to the Auto Scaling Group are fully provisioned.

Here are some alternative options:

Instead of modifying only DesiredCapacity, modify a LaunchConfiguration property at the same time. This will trigger an AutoScalingRollingUpdate to the desired capacity (the downside is that it will also update existing instances, which may not actually need to be modified).
Add an AWS::AutoScaling::LifecycleHook resource to your Auto Scaling Group, and call aws autoscaling complete-lifecycle-action in addition to cfn-signal, to signal lifecycle-hook completion. This won't delay your CloudFormation stack update as desired, but it will delay the individual auto-scaled instances from entering the InService state until the lifecycle signal is received. (See Lifecycle Hooks documentation for more info.)
As an extension to #2, it should be possible to add a Lifecycle Hook to your Auto Scaling group, as well as a Custom Resource that polls your Auto Scaling Group and only completes when the Auto Scaling group contains the DesiredCapacity number of instances all in the InService state.

162

answered Nov 06 '22 10:11

wjordan

Related questions
                            
                                using aws-sdk to upload images to s3 using nodejs
                            
                                Access AWS S3 bucket from another account using roles
                            
                                Running a simple HTTPS Node JS Server on Amazon EC2
                            
                                How to encrypt AWS Lambda environment variables using CloudFormation
                            
                                Refresh AWS Quicksight automatically [closed]
                            
                                How to debug "Missing Authentication Token" in AWS API Gateway?
                            
                                Auto Delete SQS queue
                            
                                AWS Java SDK: AbortedException on call to AmazonSQSClient.receiveMessage
                            
                                Netflix Zuul/Ribbon/Eureka vs AWS ELB/ALB & ECS
                            
                                How to fix intermittent 503 Service Unavailable after idling/redeployments on AWS HTTP API Gateway & Fargate/ECS?
                            
                                RDS Database storage runs out of space
                            
                                How to create a Vagrantfile that matches Elastic Beanstalk?
                            
                                Is ssl termination at AWS load balancer ELB secure?
                            
                                How to response non-latin characters in AWS lambda?
                            
                                Connection Pooling with PostgreSQL and AWS
                            
                                Is boto3.Bucket.upload_file blocking or non-blocking?
                            
                                How to access http headers in custom authorizer AWS lambda function
                            
                                AWS API Gateway: Issues with importing Swagger API schema
                            
                                Is it possible to add multiple auto-scaling policy with Elastic Beanstlak
                            
                                Can Spark Replace ETL Tool

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

CloudFormation AutoScalingGroup not waiting for signal on update/scale-up

Tags:

amazon-web-services

amazon-cloudformation

autoscaling