Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it possible to have an AWS EC2 scale group that defaults to 0 and only contains instances when there is work to do?

I am trying to setup a EC2 Scaling group that scales depending on how many items are in an SQS queue.

When the SQS queue has items visible I need the Scaling group to have 1 instance available and when the SQS queue is empty (e.g. there are no visible or non-visible messages) I want there to be 0 instances.

Desired instances it set to 0, min is set to 0 and max is set to 1.

I have setup cloudwatch alarms on my SQS queue to trigger when visible messages are greater than zero, and also triggers an alarm when non visible messages are less than one (i.e no more work to do).

Currently the Cloudwatch Alarm Triggers to create an instance but then the scaling group automatically kills the instance to meet the desired setting. I expected the alarm to adjust the desired instance count within the min and max settings but this seems to not be the case.

like image 705
Stephen Avatar asked Mar 19 '17 18:03

Stephen


1 Answers

Yes, you can certainly have an Auto Scaling group with:

  • Minimum = 0
  • Maximum = 1
  • Alarm: When ApproximateNumberOfMessagesVisible > 0 for 1 minute, Add 1 Instance

This will cause Auto Scaling to launch an instance when there are messages waiting in the queue. It will keep trying to launch more instances, but the Maximum setting will limit it to 1 instance.

Scaling-in when there are no messages is a little bit tricker.

Firstly, it can be difficult to actually know when to scale-in. If there are messages waiting to be processed, then ApproximateNumberOfMessagesVisible will be greater than zero. However, there are no messages waiting, it doesn't necessarily mean you wish to scale-in because messages might be currently processing ("in flight"), as indicated by ApproximateNumberOfMessagesNotVisible. So, you only want to scale-in if both of these are zero. Unfortunately, a CloudWatch alarm can only reference one metric, not two.

Secondly, when an Amazon SQS queue is empty, it does not send metrics to Amazon CloudWatch. This sort of makes sense, because queues are mostly empty, so it would be continually sending a zero metric. However, it causes a problem that CloudWatch does not receive a metric when the queue is empty. Instead, the alarm will enter the INSUFFICIENT_DATA state.

Therefore, you could create your alarm as:

  • When ApproximateNumberOfMessagesVisible = 0 for 15 minutes, Remove 1 instance but set the action to trigger on INSUFFICIENT_DATA rather than ALARM

Note the suggested "15 minutes" delay to avoid thrashing instances. This is where instances are added and removed in rapid succession because messages are coming in regularly, but infrequently. Therefore, it is better to wait a while before deciding to scale-in.

This leaves the problem of having instances terminated while they are still processing messages. This can be avoided by taking advantage of Auto Scaling Lifecycle Hooks, which send a signal when an instance is about to be terminated, giving the application the opportunity to delay the termination until work is complete. Your application should then signal that it is ready for termination only when message processing has finished.

Bottom line

Much of the above depends upon:

  • How often your application receives messages
  • How long it takes to process a message
  • The cost savings involved

If your messages are infrequent and simple to process, it might be worthwhile to continuously run a t2.micro instance. At 2c/hour, the benefit of scaling-in is minor. Also, there is always the risk when adding and removing instances that you might actually pay more, because instances are charged by the hour -- running an instance for 30 minutes, terminating it, then launching another instance for 30 minutes will actually be charged as two hours.

Finally, you could consider using AWS Lambda instead of an Amazon EC2 instance. Lambda is ideal for short-lived code execution without requiring a server. It could totally remove the need to use Amazon EC2 instances, and you only pay while the Lambda function is actually running.

like image 116
John Rotenstein Avatar answered Oct 22 '22 12:10

John Rotenstein