Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Flask application scaling on Kubernetes and Gunicorn

We have a Flask application that is served via gunicorn, using the eventlet worker. We're deploying the application in a kubernetes pod, with the idea of scaling the number of pods depending on workload.

The recommended settings for the number of workers in gunicorn is 2 - 4 x $NUM_CPUS. See docs. I've previously deployed services on dedicated physical hardware where such calculations made sense. On a 4 core machine, having 16 workers sounds OK and we eventually bumped it to 32 workers.

Does this calculation still apply in a kubernetes pod using an async worker particularly as:

  1. There could be multiple pods on a single node.
  2. The same service will be run in multiple pods.

How should I set the number of gunicorn workers?

  1. Set it to -w 1 and let kubernetes handle the scaling via pods?
  2. Set it to 2-4 x $NUM_CPU on the kubernetes nodes. On one pod or multiple?
  3. Something else entirely?

Update

We decided to go with the 1st option, which is our current approach. Set the number of gunicorn works to 1, and scale horizontally by increasing the number of pods. Otherwise there will be too many moving parts plus we won't be leveraging Kubernetes to its full potential.

like image 418
CadentOrange Avatar asked Jun 25 '19 07:06

CadentOrange


1 Answers

For better visibility of the final solution chosen by original author of this question as of 2019 year

Set the number of gunicorn works to 1 (-w 1), and scale horizontally by increasing the number of pods (using Kubernetes HPA).

and the fact it might be not applicable in the close future, taking into account fast growth of workload related features in Kubernetes platform, e.g. some distributions of Kubernetes propose beside HPA, Vertical Pod Autoscaling (VPA) and Multidimensional Pod autoscaling (MPA) too, so I propose to continue this thread in form of community wiki post.

like image 116
Nepomucen Avatar answered Sep 18 '22 20:09

Nepomucen