Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Worker "dyno" in AWS Elastic Beanstalk

Amazon Web Service has now a worker tiers in their Elastic Beanstalk. But, it nevertheless confuse us who come from the days of Worker dyno.

As a comparison, in Heroku, one can configure two dynos (something like processor?) each for web and worker. The web will work for any request, and will timeout normally at 15 secs. Thus, if you have a request that last more than that, your request will simply timed-out although not terminated per se. In that case, you should use worker and your web dyno should visit the endpoint several times per minutes (maybe) to check if there is any result to be brought back to the user. To make either worker or web dyno, what you need is just slide the slider and you are good to go. Sometimes, you may need a Procfile. But there is nothing fancy, or something really difficult, or confusing about it.

In AWS EBS (Elastic Beanstalk), since day 1 you hit eb init, you will be asked whether it is a Standard or Worker. When you hit Standard, it seems there is no way to make it as worker as well.

In our situation, the worker and standard web is located under one application. So, how could we use an EBS instance both for worker and standard. Our worker is using sidekiq, and redis. Please, point to any guidance or help us in this matter.

like image 566
user3551335 Avatar asked Aug 09 '14 03:08

user3551335


People also ask

What is worker in Elastic Beanstalk?

Elastic Beanstalk Worker is a managed compute environment for long running tasks. By using worker, we can decouple web application front-end from a process that performs blocking operations so that your application stays responsive under load.

Does Elastic Beanstalk auto scale?

Your AWS Elastic Beanstalk environment includes an Auto Scaling group that manages the Amazon EC2 instances in your environment. In a single-instance environment, the Auto Scaling group ensures that there is always one instance running.

What load balancer does Elastic Beanstalk use?

A load balancer distributes traffic among your environment's instances. When you enable load balancing, AWS Elastic Beanstalk creates an Elastic Load Balancing load balancer dedicated to your environment.


1 Answers

AWS Elastic Beanstalk has two types of Environments - Web tier and Worker tier.

Web tier environments are meant for web applications - http/https request processing. You get one or more EC2 instances behind a load balancer. You can get other resources like database per your requirement. You can choose the platform you wish e.g. Ruby, Python, Java, Node.js, PHP, Docker.

Worker environments are meant for asynchronous message processing. When you create a worker environment you do not have a load balancer. All your EC2 instances are in an autoscaling group. All these instances are running a daemon which is polling a single SQS queue for messages. When a message is pulled by the daemon from the SQS queue, the daemon sends a HTTP Post request on localhost:80. You can configure the port but the important thing is that the daemon posts the message as an HTTP request on localhost. Your worker application is actually a web application that receives the post request and processes the message. After the message is successfully processed the worker daemon expects that your web application running on localhost returns a HTTP 200 OK response. The daemon then deletes the message from SQS queue. You can write your worker application for any platform just like standard web server applications - Ruby, Python, Java, Node.js, PHP, Docker.

Based on my understanding of your usecase I would recommend creating two Elastic Beanstalk environments - one Standard and one Worker environment. The Standard web server receives HTTP requests and processes them synchronously. This environment puts the relevant data in an SQS queue. The second environment is a worker and the daemon running on this environment polls this SQS queue for messages. Your second environment is a web application that is NOT open to the internet. The worker daemon posts the messages as HTTP requests to your worker environment. Thus you can process long running workloads asynchronously using this second worker environment.

With worker environments you can use your own queues or Elastic Beanstalk can generate a queue for you. You can configure parameters like message visibility timeout, http connections based on your requirements or you can use the defaults.

Below are some links that may be useful for you:

http://aws.amazon.com/blogs/aws/background-task-handling-for-aws-elastic-beanstalk/

http://blogs.aws.amazon.com/application-management/post/Tx1Y8QSQRL1KQZC/Elastic-Beanstalk-Video-Tutorial-Worker-Tier

https://stackoverflow.com/a/23942498/161628

Does this meet your requirements? Please let me know if you have further questions.

Update

You need to upload your source code at two places - once for the worker environment and once for the web server environment. If someone was starting from scratch then they might have two separate code bases. But I think in your case I think it should be perfectly fine to have a single code base shared between the two environments. Suppose your web request arrives at '/register', then the register() method in your application can post messages to an SQS queue and be done with the HTTP request. Now your worker environment will poll the SQS queue and post messages over HTTP on localhost to the URL '/async_register' which will invoke a method async_register() in your application and do the asynchronous processing. These two methods can live in the same source code bundle which can be shared by both the worker and web server environment. The code path taken by worker and web server will be different so that web server environments will invoke register() and worker environments will invoke the async_register() method.

Another caveat is that HTTP requests sent by the worker daemon on localhost will contain an HTTP header - "User-Agent": "aws-sqsd/1.1". Read more here. So in your web application you can have a single listener to post requests on "/register" and depending on whether this header is present or not you invoke register() or async_register() methods internally.

Also I think if you want to share the code base between the two environments you can upload the code base at only one place. Your environments are logically grouped into applications. So you can have a single application. You upload your source code to this application using the "CreateApplicationVersion" API call. Suppose you upload an application version with label 'v1'. You can now create a worker environment and a web server environment under the same application. When you create an environment you need to provide a version to deploy to your enviroment. In this case you can deploy v1 to both environments. So you will be sharing the same source code for both environments. When you have a new version "v2". You upload this version and then perform an update on both environments changing their version to "v2".

The same version of the source code can be deployed to both environments. They will be running on different EC2 instances because one environment is dedicated for responding to web requests and one environment is dedicated for responding to asynchronous web requests (worker).

like image 60
Rohit Banga Avatar answered Sep 23 '22 23:09

Rohit Banga