Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

gunicorn worker exits for every request

I have a fresh installation of apache-airflow 1.8.2, started its webserver, and its gunicorn workers exit for every webpage request, leaving the request hang for around 30s while waiting for a new worker to spawn. Need help fixing this issue.

Details below

I've installed apache-airflow 1.8.2 and followed this guide. I started the webserver at port 8081.

Now when I visit the server using my browser, the response is very slow. I looked at the console output, and noticed that every time I load a webpage, it says "Worker existing", and then pauses for a long time and says "Booting worker".

After digging the source code I found out that these are gunicorn workers. I have no experience with gunicorn or airflow or Flask, so I don't know if this is the expected behavior, but I feel like it shouldn't. At least a webserver should not hang for half a minute for every webpage.

Console output:

---> Browser request
[2017-11-01 19:08:07 -0700] [14549] [INFO] Worker exiting (pid: 14549)
---> Hangs for 30s
[2017-11-01 19:08:37 -0700] [13316] [INFO] Handling signal: ttin
[2017-11-01 19:08:37 -0700] [14698] [INFO] Booting worker with pid: 14698
/Users/michael/Programs/clones/airflow/airflow/www/app.py:23: FlaskWTFDeprecationWarning: "flask_wtf.CsrfProtect" has been renamed to "CSRFProtect" and will be removed in 1.0.
  csrf = CsrfProtect()
/Users/michael/Programs/miaozhen/tests/airflow-test/lib/python3.6/site-packages/flask/exthook.py:71: ExtDeprecationWarning: Importing flask.ext.cache is deprecated, use flask_cache instead.
  .format(x=modname), ExtDeprecationWarning
127.0.0.1 - - [01/Nov/2017:19:08:37 -0700] "GET /admin/ HTTP/1.1" 200 95063 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
[2017-11-01 19:08:38,096] [14698] {models.py:168} INFO - Filling up the DagBag from /Users/michael/airflow/dags
---> other GET requests on the same webpage, skipped here for simplicity
[2017-11-01 19:08:39 -0700] [13316] [INFO] Handling signal: ttou

Now I'm running a source version of apache-airflow 1.8.2 (i.e. cloned the source, checked out the tag, and installed with pip install -e .) in a virtualenv. However I've also tried: running the pypi version (pip install apache-airflow) without virtualenv; running the source version without virtualenv. And the same problem exists for all installations, so these differences are irrelevant.

My Python installation is:

$ python -VV
Python 3.6.3 (default, Oct  4 2017, 06:09:38) 
[GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.37)]

EDIT:

I tried installing & running apache-airflow on another machine (Ubuntu Linux 16.04 + Python 3.5), and there is no problem. I also asked another person who is on Mac with Python 3.6, and there is no problem either. I guess there's something weird with my machine... Any suggestion how I can debug this thing?

like image 222
Michael Kim Avatar asked Nov 02 '17 02:11

Michael Kim


People also ask

How many requests can a gunicorn worker handle?

If each request takes exactly 1 millisecond to handle, then a single worker can serve 1000 RPS.

How does gunicorn handle multiple requests?

Gunicorn is based on the pre-fork worker model. This means that there is a central master process that manages a set of worker processes. The master never knows anything about individual clients. All requests and responses are handled completely by worker processes.

What is gunicorn default timeout?

proxy_read_timeout 300s; Restart Nginx server. If above fix doesn't work, then increase Gunicorn timeout flag in Gunicorn configuration, default Gunicorn timeout is 30 seconds. --timeout 90. Gunicorn documentation about timeout.

What is graceful timeout in gunicorn?

After 30 seconds (configurable with timeout ) of request processing, gunicorn master process sends SIGTERM to the worker process, to initiate a graceful restart. If worker does not shutdown during another 30 seconds (configurable with graceful_timeout ), master process sends SIGKILL .


1 Answers

Workers regularly exiting from the signal ttou (means decrement # of processes by one) every so often is intentional. This is airflow periodically "refreshing" workers. Based on what I read in AIRFLOW-276, which added this feature, refreshing workers is to ensure they pickup on new or updated DAGs. This behavior can be modified in your airflow config under worker_refresh_interval and worker_refresh_batch_size.

From looking at the source, it spins new workers up before spinning down old workers, so I don't think this would cause a delay for your requests. However, you can try disabling it with worker_refresh_batch_size = 0.

like image 110
Daniel Huang Avatar answered Oct 08 '22 15:10

Daniel Huang