CRITICAL WORKER TIMEOUT on gunicorn when deployed to AWS

Tags:

I have a flask web-app that uses a gunicorn server and I have used the gevent worker class as that previously helped me not get [CRITICAL] WORKER TIMEOUT issues before but since I have deployed it on to AWS behind an ELB, I seem to be getting this issue again.

I have tried eventlet worker class before and that didn't work but gevent did locally

This is the shell script that I have used as an entrypoint for my Dockerfile:

gunicorn -b 0.0.0.0:5000 --worker-class=gevent --worker-connections 1000 --timeout 60 --keep-alive 20 dataclone_controller:app

When i check the logs on the pods, this is the only information that gets printed out:

[2019-09-04 11:36:12 +0000] [8] [INFO] Starting gunicorn 19.9.0
   [2019-09-04 11:36:12 +0000] [8] [INFO] Listening at: 
   http://0.0.0.0:5000 (8)
   [2019-09-04 11:36:12 +0000] [8] [INFO] Using worker: gevent
   [2019-09-04 11:36:12 +0000] [11] [INFO] Booting worker with pid: 11
   [2019-09-04 11:38:15 +0000] [8] [CRITICAL] WORKER TIMEOUT (pid:11)

869

asked Sep 04 '19 12:09

siddharth.nair

1 Answers

For our Django application, we eventually tracked this down to memory exhaustion. This is difficult to track down because the AWS monitoring does not provide memory statistics (at least by default) and even if it did, its not clear how easy a transient spike would be to see.

Additional symptoms included:

We would often lose network connectivity to the VM at this point.
/var/log/syslog contained some evidence of some processes restarting (in our case, this was mostly Hashicorp's Consul).
There was no evidence of the Linux OOM detection coming into play.
We knew the system was busy because the AWS CPU stats would often show a spike (to say 60%).

The fix for us lay in judicious conversion of Django queries which looked like this:

   for item in qs:
       do_something()

to use .iterator() like this:

CHUNK_SIZE = 5
...
   for item in qs.iterator(CHUNK_SIZE):
       do_something()

which effectively trades database round-trips for lower memory usage. Note that CHUNK_SIZE = 5 made sense because we were fetching some database objects with big columns of JSONB. I expect that more typical usage might use a number several orders of magnitude larger.

answered Sep 19 '22 12:09

Shaheed Haque

Related questions
                            
                                How to generate noisy mock time series or signal (in Python)
                            
                                How to create Pandas Series with Decimal?
                            
                                How to resolve "chromedriver executable needs to be in PATH" error when running Selenium Chrome using virtualenv within PyDev?
                            
                                Allow Python.app on El Capitan (OS X)
                            
                                TensorFlow: getting all states from a RNN
                            
                                Regression Tests on Arbitrary Number Sequences
                            
                                Python: AWS Lambda "errorMessage": "Unable to import module '<module_name>'"
                            
                                Django: efficient template/string separation and override
                            
                                Get inner type from concrete type associated with a TypeVar
                            
                                Why am I not seeing speed up via multiprocessing in Python?
                            
                                How to find how many Image Generated By ImageDataGenerator
                            
                                Getting selenium to launch safari with default profile in python
                            
                                Creating the xml format to pass to zeep
                            
                                Why `sklearn` and `statsmodels` implementation of OLS regression give different R^2?
                            
                                Remove substrings inside a list with better than O(n^2) complexity
                            
                                Where does pip install packages from?
                            
                                using PeriodIndex vs DateTimeIndex in pandas?
                            
                                how to install tkinter with Pycharm?
                            
                                Is there anyway to save Google colab environment to somewhere and reuse it?
                            
                                Why are the images generated by a GAN get darker as the network trains more?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

CRITICAL WORKER TIMEOUT on gunicorn when deployed to AWS

Tags:

python

dockerfile

amazon-web-services

flask

gunicorn

siddharth.nair

People also ask

1 Answers

Shaheed Haque

Recent Activity

Donate For Us