Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Difference Between Gunicorn Worker Processes and Heroku Worker Dynos

I'm hoping the community can clarify something for me, and that others can benefit.

My understanding is that gunicorn worker processes are essentially virtual replicas of Heroku web dynos. In other words, Gunicorn's worker processes should not be confused with Heroku's worker processes (e.g. Django Celery Tasks).

This is because Gunicorn worker processes are focused on handling web requests (basically throttling up the performance of the Heroku Web Dyno) while Heroku Worker Dynos specialize in Remote API calls, etc that are long-running background tasks.

I have a simple Django app that makes decent use of Remote APIs and I want to optimize the resource balance. I am also querying a PostgreSQL database on most requests.

I know that this is very much an oversimplification, but am I thinking about things correctly?

Some relevant info:

https://devcenter.heroku.com/articles/process-model

https://devcenter.heroku.com/articles/background-jobs-queueing

https://devcenter.heroku.com/articles/django#running-a-worker

http://gunicorn.org/configure.html#workers

http://v3.mike.tig.as/blog/2012/02/13/deploying-django-on-heroku/

https://docs.djangoproject.com/en/dev/howto/deployment/wsgi/gunicorn/

Other Quasi-Related Helpful SO Questions for those researching this topic:

Troubleshooting Site Slowness on a Nginx + Gunicorn + Django Stack

Performance degradation for Django with Gunicorn deployed into Heroku

Configuring gunicorn for Django on Heroku

Troubleshooting Site Slowness on a Nginx + Gunicorn + Django Stack

like image 392
BFar Avatar asked Oct 09 '12 18:10

BFar


1 Answers

To provide an answer and prevent people from having to search through the comments, a dyno is like an entire computer. Using the Procfile, you give each of your dynos one command to run, and it cranks away on that command, re-running it periodically to refresh it and re-running it when it crashes. As you can imagine, it's rather wasteful to waste an entire computer running a single-threaded webserver, and that's where Gunicorn comes in.

The Gunicorn master thread does nothing but act as a proxy server, spawning a given number of copies of your application (workers), distributing HTTP requests amongst them. It takes advantage of the fact that each dyno actually has multiple cores. As someone mentioned, the number of workers you should choose depends on how much memory your app takes to run.

Contrary to what Bob Spryn said in the last comment, there are other ways of exploiting this opportunity for parallelism to run separate servers on the same dyno. The easiest way is to make a separate sub-procfile and run the all-Python Foreman equivalent, Honcho, from your main Procfile, following these directions. Essentially, in this case your single dyno command is a program that manages multiple single commands. It's kind of like being granted one wish from a genie, and making that wish be for 4 more wishes.

The advantage of this is you get to take full advantage of your dynos' capacity. The disadvantage of this approach is that you lose the ability scale individual parts of your app independently when they're sharing a dyno. When you scale the dyno, it will scale everything you've multiplexed onto it, which may not be desired. You will probably have to use diagnostics to decide when a service should be put on its own dedicated dyno.

like image 88
acjay Avatar answered Sep 28 '22 06:09

acjay