Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the correct Cloudwatch/Autoscale settings for extremely short traffic spikes on Amazon Web Services?

I have a site running on amazon elastic beanstalk with the following traffic pattern:

  • ~50 concurrent users normally.
  • ~2000 concurrent users for 1/2 minutes when post is made to Facebook page.

Amazon web services claim to be able to rapidly scale to challenges like this but the "Greater than x for more than 1 minute" setup of cloudwatch doesn't appear to be fast enough for this traffic pattern?

Usually within seconds all the ec2 instances crash, killing all cloudwatch metrics and the whole site is down for 4/6 minutes. So far I've yet to find a configuration that works for this senario.

Here is the graph of a smaller event that also killed the site: enter image description here

like image 992
Ben Avatar asked Jun 22 '12 12:06

Ben


2 Answers

Are these links posted predictably? If so, you can use Scaling by Schedule or as alternative you might change DESIRED-CAPACITY value of Auto Scaling Group or even trigger as-execute-policy to scale out straight before your link is posted.

Do you know you can have multiple scaling policies in one group? So you might have special Auto Scaling policy for your case, something like SCALE_OUT_HIGH which adds say 10 more instances at once. Take a look at as-put-scaling-policy command.

Also, you need to check your code and find bottle necks.

What HTTPD do you use? Consider of switching to Nginx as it's much more faster and less resource consuming software than Apache. Try to use Memcache... NoSQL like Redis for hight read and writes is fine option as well.

like image 150
Roman Newaza Avatar answered Nov 18 '22 10:11

Roman Newaza


The suggestion from AWS was as follows:

We are always working to make our systems more responsive, but it is challenging to provision virtual servers automatically with a response time of a few seconds as your use case appears to require. Perhaps there is a workaround that responds more quickly or that is more resilient when requests begin to increase.

Have you observed whether the site performs better if you use a larger instance type or a larger number of instances in the steady state? That may be one method to be resilient to rapid increases in inbound requests. Although I recognize it may not be the most cost-effective, you may find this to be a quick fix.

Another approach may be to adjust your alarm to use a threshold or a metric that would reflect (or predict) your demand increase sooner. For example, you might see better performance if you set your alarm to add instances after you exceed 75 or 100 users. You may already be doing this. Aside from that, your use case may have another indicator that predicts a demand increase, for example a posting on your Facebook page may precede a significant request increase by several seconds or even a minute. Using CloudWatch custom metrics to monitor that value and then setting an alarm to Auto Scale on it may also be a potential solution.

So I think the best answer is to run more instances at lower traffic and use custom metrics to predict traffic from an external source. I am going to try, for example, monitoring Facebook and Twitter for posts with links to the site and scaling up straight away.

like image 1
Ben Avatar answered Nov 18 '22 08:11

Ben