Amazon Auto Scaling API for Job Servers

Tags:

I have read pretty much the entire documentation even beyond on the AWS AS API to understand all the AS stuff.

However I am still wondering (without having actually used the API yet since I wanna find this out first from someone) if my scenario is viable with AS.

Say I got a bunch of work servers setup within an AS group all working on a job each and suddenly it comes the time (I dunno say, AVG CPU is greater than or in another case less than 80%) to scale up or down.

My main worry is the loss of a currently in progress job. Maybe this would be better explained with an example:

I startup 5 job servers with 5 jobs on them
A job finishes on one and fires a scale down trigger in the Amazon API
Amazon comes to scale them down
I lose a job server that is actually currently running a job (90% complete gotta start again)

With this in mind is it better for me to just use the Amazon Spot Instance/EC2 API and just manage my own scaling or is there something I am missing about how the Amazon API judges server terminations?

To be honest I rather scale to SQS waiting amount than some health figure on the servers:

for every 100 messages waiting increase cluster capacity by 20%

But this doesn't seem to be too viable with AS either.

So is the AWS AS API not the right solution or am I missing some vital info about how it works?

Thanks,

839

asked Jun 04 '12 18:06

Sammaye

2 Answers

After some searching I found that there are two accepted ways to manage AS API or AS in general for jobs:

One method is to manipulate the health of a server directly from within the worker itself. This is what quite a few sites do and it is effective, when your worker detects no more jobs or redundancy in the system it marks the server it is on as unhealthy. This way the AS API comes along and automatically takes it down after a period of time.

So with this method you would have a scale up policy based on your SQS queue size over a period of time (say for every 5 mins of SQS messages being over 100 add 2 servers; every 10 mins of SQS messages being over 500 double network capacity by 50%). The scale down would be handled by code instead of an active policy.

This method would work with zero clusters too so you can down your cluster all the way to no servers when it's not being used making it quite cost effective.

Advantages:

Easy to setup
Using AWS API functions
Probably the quickest to setup
Using AWS managed API to manage the cluster size for you

Disadvantages:

Hard to manage without using full AWS API i.e. when making a new server you can't get it's instanceid back without doing a full API command return of all instanceids. There are other occasions the AWS AS API gets in your way and makes life a little harder if you want a element of self control over your cluster
Relying on Amazon to know what's best for your wallet. You are relying on the Amazon API to scale correctly, this is an advantage to many but a disadvantage to some.
The worker must house some of your server pool code meaning that the worker is not generic and can't just instantly be moved to another cluster without some configuration change.

With this in mind there is a second option, DIY. You use the EC2 Spot Instance and on Demand Instance API to make your own AS API based around your custom rules. This is pretty simple to explain:

You have a cli script that when run starts, say, 10 servers
You have a cronjob that when detects a satisfying of certain conditions downs the servers or ups more

Advantages:

Easy and clean to manage your end
Can make generic workers
The server pool can start to manage many clusters
You can make the rules and what not really quite complex getting figures from metrics on AWS and using them with comparison and time ranges to understand if things should change.

Disadvantages:

Hard to get multi-region (not so bad for SQS since SQS is single region)
Hard to deal with errors in region capacity and workload
You must rely on your own servers uptime and your own code to ensure that the cronjob runs as it should and provisions servers as it should and breaks them down when it should.

So really it seems to be a battle of which is more comfortable for the end user. I personally am mulling the two still and have created a small self hosted server pooler that could work for me but at the same time I am tempted to try and get this to work on AWS' own API.

Hope this helps people,

EDIT: Note with either of these methods you will still require a function on your side to predict how you should bid, as such you will need to call the bid history API on your spot type (EC2 type) and compute how to bid.

Another Edit: Another way to automatically detect redundancy in a system is to check the empty responses metric for your SQS queue. This is the amount of times your workers have pinged the queue and received no response. This is quite effective if you use an exclusive lock in your app for the duration of the worker.

answered Oct 04 '22 10:10

Sammaye

I just had the same kind of problem, and I talked to an Amazon guy who talked to me about the termination protection. In fact, if an instance has the termination protection activated, it can't be terminated. When a scale down is triggered, the app will be removed from the auto-scaling group, but it won't be terminated. To terminate it, you have to disable the termination protection and then terminate it (you can do that at the end of your job, for example).

To sum up, what you have to do is :

Add a startup script in your AMI to activate the Termination Protection
Keep your auto-scaling rules (scale-up and scale-down)
On the running instances, once it is safe to terminate an instance (end of a job, ...), deactivate the Termination Protection and terminate the instance

You can do all that using the AWS API.

answered Oct 04 '22 08:10

Talal MAZROUI

Related questions
                            
                                How do I create an EC2 image from a running instance using boto?
                            
                                How to Copy files from one EBS to Another EBS
                            
                                Amazon EC2 Load Testing
                            
                                Browser uploads to s3 with instance roles
                            
                                Use of "Proceed without key pair" in EC2 instance creation?
                            
                                Can AWS Lambdas receive inbound TCP connections?
                            
                                Cannot access ports in AWS ECS EC2 instance
                            
                                Programmatically Stop AWS EC2 in case of inactivity
                            
                                Installing pyOpenSSL on Amazon Linux (EC2)
                            
                                Python & Amazon EC2 -- Recommended Library? [closed]
                            
                                AWS Custom CloudWatch metrics - Aggregate by Auto-Scaling group
                            
                                what's the difference between attach and mount in ebs for amazon ec2
                            
                                Elastic Beanstalk unable to install packages
                            
                                I cannot locate production log files on Elastic Beanstalk ec2 instance
                            
                                Unable to connect to amazon EC2 instance via PuTTY
                            
                                Connecting to a remote Spark master - Java / Scala
                            
                                Boto3 - Print AWS Instance Average CPU Utilization
                            
                                How can I edit a file on EC2 directly from my localhost?
                            
                                Are Amazon's micro instances (Linux, 64bit) good for MongoDB servers?
                            
                                can't ssh after assigning an elastic ip

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Amazon Auto Scaling API for Job Servers

Tags:

amazon-web-services

amazon-ec2

autoscaling

Sammaye

People also ask

2 Answers

Sammaye

Talal MAZROUI

Recent Activity

Donate For Us