Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

gunicorn and/or celery: What is the way get the best out of both?

I've a machine learning application which uses flask to expose api(for production this is not a good idea, but even if I'll use django in future the idea of the question shouldn't change).

The main problem is how to serve multiple requests to my app. Few months back celery has been added to get around this problem. The number of workers in celery that was spawned is equal to the number of cores present in the machine. For very few users this was looking fine and was in production for some time.

When the number of concurrent users got increased, it was evident that we should do a performance testing on it. It turns out: it is able to handle 20 users for 30 GB and 8 core machine without authentication and without any front-end. Which is not looking like a good number.

I didn't know there are things like: application server, web server, model server. When googling for this problem: gunicorn was a good application server python application.

  • Should I use gunicorn or any other application server along with celery and why
  • If I remove celery and only use gunicorn with the application can I achieve concurrency. I have read somewhere celery is not good for machine learning applications.
  • What are the purposes of gunicorn and celery. How can we achieve the best out of both.

Note: Main goal is to maximize concurrency. While serving in production authentication will be added. One front-end application might come into action in between in production.

like image 649
Abhisek Avatar asked Jan 02 '23 12:01

Abhisek


1 Answers

There is no shame in flask. If in fact you just need a web API wrapper, flask is probably a much better choice than django (simply because django is huge and you'd be using only a fraction of its capability).

However, your concurrency problems are apparently stemming from the fact that you are doing some heavy-duty processing for each request. There is simply no way around that; if you require a certain amount of computational resources per request, you can't magic those up. From here on, it's a juggling act.

  • If you want a guaranteed response immediately, you need to have as many workers as potential simultaneous requests. This may involve load balancing over multiple servers, if you can't scrounge up enough resources on one server. (cue gunicorn, a web application server, responsible for accepting connections and then distributing them to multiple application processes.)

  • If you are okay with not getting an immediate response, you can let stuff queue up. (cue celery, a task queue, which worker processes can use to retrieve the next thing to be done, and deposit results). This works best if you don't need a response in the same request-response cycle; e.g. you submit a job from client, and they only get an acknowledgement that the job has been received; you would need a second request to ask about the status of the job, and possibly the results of the job if it is finished.

  • Alternately, instead of Flask you could use websockets or Tornado, to push out the response to the client when it is available (as opposed to user polling for results, or waiting on a live HTTP connection and taking up a server process).

like image 137
Amadan Avatar answered Jan 06 '23 07:01

Amadan