Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Celery message queue vs AWS Lambda task processing

Currently I'm developing a system to analyse and visualise textual data based on NLP.

The backend (Python+Flask+AWS EC2) handles the analysis, and uses an API to feed the result back to a frontend (FLASK+D3+Heroku) app that solely handles interactive visualisations.

Right now the analysis in the prototype is a basic python function which means on large files the analysis take longer and thus resulting a request timeout during the API data bridging to frontend. As well as the analysis of many files is done in a linear blocking queue.

So to scale this prototype, I need to modify the Analysis(text) function to be a background task so it does not block further execution and can do a callback once the function is done. The input text is fetched from AWS S3 and the output is a relatively large JSON format aiming to be stored in AWS S3 as well, so the API bridge will simply fetch this JSON that contains data for all the graphs in the frontend app. (I find S3 slightly easier to handle than creating a large relational database structure to store persistent data..)

I'm doing simple examples with Celery and find it fitting as a solution, however i just did some reading in AWS Lambda which on paper seems like a better solution in terms of scaling...

The Analysis(text) function uses a pre-built model and functions from relatively common NLP python packages. As my lack of experience in scaling a prototype I'd like to ask for your experiences and judgement of which solution would be most fitting for this scenario.

Thank you :)

like image 817
JQian Avatar asked Oct 21 '16 09:10

JQian


People also ask

How does Celery execute tasks?

Process of Task Execution by Celery can be broken down into:Your application sends the tasks to the task broker, it is then reserved by a worker for execution & finally the result of task execution is stored in the result backend.

What are the three different ways you can deploy your code to Lambda?

You can upload a . zip file as your deployment package using the Lambda console, AWS Command Line Interface (AWS CLI), or to an Amazon Simple Storage Service (Amazon S3) bucket.

What language is fastest in Lambda?

The benefits of Python in AWS Lambda environments It's about 100 times faster than Java or C#. Third-party modules. Like npm, Python has a wide variety of modules available. The feature helps ease interaction with other languages and platforms.

How many requests can Lambda handle per second?

Lambda uses the 3,000 available burst concurrency to create new environments to handle the additional load. 1,000 requests are throttled as there is not enough burst concurrency to handle all 5,000 requests. Transactions per second are 16,000.


1 Answers

I would like to share a personal experience. I moved my heavy-lifting tasks to AWS Lambda and I must admit that the ROI has been pretty good. For instance, one of my tasks was to generate monthly statements for the customers and then mail them to the customers as well. The data for each statement was fed into a Jinja template which gave me an HTML of the statement. Using Weasyprint, I converted the HTML to a Pdf file. And then mailing those pdf statements was the last step. I researched through various options for creating pdf files directly, but they didn't looked feasible for me.

That said, when the scale was low, i.e. for when the number of customers was small, celery was wonderful. However to mention, during this task, I observed CPU usages went high. I would add to the celery queue this task for each of the customers, from which the celery workers would pick up the tasks and execute it.

But when the scale went high, celery didn't turn out to be a robust option. CPU usages were pretty high(I don't blame celery for it, but that is what I observed). Celery is still good though. But do understand this, that with celery, you can face scaling issues. Vertical scaling may not help you. So you need horizontally scale as your backend grows to get get a good performance from celery. When there are a lot of tasks waiting in the queue, and the number of workers is limited, naturally a lot of tasks would have to wait.

So in my case, I moved this CPU-intensive task to AWS Lambda. So, I deployed a function that would generate the statement Pdf from the customer's statement data, and mail it afterward. Immediately, AWS Lambda solved our scaling issues. Secondly, since this was more of a period task, not a daily task - so we didn't need to run celery everyday. The Lambda would launch whenever needed - but won't run when not in use. Besides, this function was in NodeJS, since the npm package I found turned out to be more efficient the solution I had in Python. So Lambda is also advantageous because you can take advantages of various programming languages, yet your core may be unchanged. Also, I personally think that Lambda is quite cheap - since the free tier offers a lot of compute time per month(GB-seconds). Also, underlying servers on which your Lambdas are taken care to be updated to the latest security patches as and when available. As you can see, my maintenance cost has drastically dropped.

AWS Lambdas scale as per need. Plus, they can serve a good use case for tasks like real-time stream processing, or for heavy data-processing tasks, or for running tasks which could be very CPU intensive.

like image 149
Archit Kapoor Avatar answered Sep 21 '22 13:09

Archit Kapoor