Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

airflow webserver starting - gunicorn workers shutting down

I am running airflow 1.8 on centos7 on docker and my webserver is not getting to the browser. I installed airflow via pip2.7. Flower ui is displaying fine, initdb ran connecting to a postgres and redis backend, using CeleryExecutor, running on ECS, and I am running as root. Webserver is being deployed via airflow webserver to default 8080.

Does anyone know what the causes / solutions are for the gunicorn workers exiting are per the log shown below? Specifically, it seems like it is this line

ERROR - [0 / 0] some workers seem to have died and gunicorndid not restart them as expected

Whole log...

[2018-04-13 20:05:01,161] {db.py:287} INFO - Creating tables
INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
INFO  [alembic.runtime.migration] Will assume transactional DDL.
Done.
[2018-04-13 20:05:02,358] {__init__.py:57} INFO - Using executor CeleryExecutor
/usr/local/lib/python2.7/site-packages/flask/exthook.py:71: ExtDeprecationWarning: Importing flask.ext.cache is deprecated, use flask_cache instead.
  .format(x=modname), ExtDeprecationWarning
  ____________       _____________
 ____    |__( )_________  __/__  /________      __
____  /| |_  /__  ___/_  /_ __  /_  __ \_ | /| / /
___  ___ |  / _  /   _  __/ _  / / /_/ /_ |/ |/ /
 _/_/  |_/_/  /_/    /_/    /_/  \____/____/|__/

[2018-04-13 20:05:03,363] [1] {models.py:167} INFO - Filling up the DagBag from /usr/local/airflow/dags
[2018-04-13 20:05:04,488] {__init__.py:57} INFO - Using executor CeleryExecutor
[2018-04-13 20:05:04 +0000] [18] [INFO] Starting gunicorn 19.3.0
[2018-04-13 20:05:04 +0000] [18] [INFO] Listening at: http://0.0.0.0:8080 (18)
[2018-04-13 20:05:04 +0000] [18] [INFO] Using worker: sync
[2018-04-13 20:05:04 +0000] [24] [INFO] Booting worker with pid: 24
[2018-04-13 20:05:05 +0000] [25] [INFO] Booting worker with pid: 25
/usr/local/lib/python2.7/site-packages/flask/exthook.py:71: ExtDeprecationWarning: Importing flask.ext.cache is deprecated, use flask_cache instead.
  .format(x=modname), ExtDeprecationWarning
[2018-04-13 20:05:05 +0000] [26] [INFO] Booting worker with pid: 26
[2018-04-13 20:05:05 +0000] [27] [INFO] Booting worker with pid: 27
/usr/local/lib/python2.7/site-packages/flask/exthook.py:71: ExtDeprecationWarning: Importing flask.ext.cache is deprecated, use flask_cache instead.
  .format(x=modname), ExtDeprecationWarning
Running the Gunicorn Server with:
Workers: 4 sync
Host: 0.0.0.0:8080
Timeout: 120
Logfiles: - -
=================================================================            
/usr/local/lib/python2.7/site-packages/flask/exthook.py:71: ExtDeprecationWarning: Importing flask.ext.cache is deprecated, use flask_cache instead.
  .format(x=modname), ExtDeprecationWarning
/usr/local/lib/python2.7/site-packages/flask/exthook.py:71: ExtDeprecationWarning: Importing flask.ext.cache is deprecated, use flask_cache instead.
  .format(x=modname), ExtDeprecationWarning
[2018-04-13 20:05:06,461] [24] {models.py:167} INFO - Filling up the DagBag from /usr/local/airflow/dags
[2018-04-13 20:05:07,873] [1] {cli.py:723} ERROR - [0 / 0] some workers seem to have died and gunicorndid not restart them as expected
[2018-04-13 20:05:08,271] [27] {models.py:167} INFO - Filling up the DagBag from /usr/local/airflow/dags
[2018-04-13 20:05:08,271] [25] {models.py:167} INFO - Filling up the DagBag from /usr/local/airflow/dags
[2018-04-13 20:05:08,271] [26] {models.py:167} INFO - Filling up the DagBag from /usr/local/airflow/dags
[2018-04-13 20:05:09 +0000] [25] [INFO] Parent changed, shutting down: <Worker 25>
[2018-04-13 20:05:09 +0000] [25] [INFO] Worker exiting (pid: 25)
[2018-04-13 20:05:09 +0000] [26] [INFO] Parent changed, shutting down: <Worker 26>
[2018-04-13 20:05:09 +0000] [26] [INFO] Worker exiting (pid: 26)
[2018-04-13 20:05:09 +0000] [27] [INFO] Parent changed, shutting down: <Worker 27>
[2018-04-13 20:05:09 +0000] [27] [INFO] Worker exiting (pid: 27)

I swear I had this working not long ago, don't know what happened. Here is a list of pip packages I installed

airflow (1.8.0)
alembic (0.8.10)
amqp (2.2.2)
asn1crypto (0.24.0)
awscli (1.15.4)
Babel (2.5.3)
backports-abc (0.5)
billiard (3.5.0.3)
boto3 (1.7.4)
botocore (1.10.4)
celery (4.0.2)
certifi (2018.1.18)
cffi (1.11.5)
chardet (3.0.4)
click (6.7)
colorama (0.3.7)
croniter (0.3.20)
cryptography (2.2.2)
Cython (0.28.2)
dill (0.2.7.1)
docutils (0.14)
enum34 (1.1.6)
Flask (0.11.1)
Flask-Admin (1.4.1)
Flask-Cache (0.13.1)
Flask-Login (0.2.11)
flask-swagger (0.2.13)
Flask-WTF (0.12)
flower (0.9.2)
funcsigs (1.0.0)
future (0.15.2)
futures (3.2.0)
gitdb2 (2.0.3)
GitPython (2.1.9)
gunicorn (19.3.0)
idna (2.6)
ipaddress (1.0.19)
itsdangerous (0.24)
Jinja2 (2.8.1)
jmespath (0.9.3)
kombu (4.1.0)
lockfile (0.12.2)
lxml (3.8.0)
Mako (1.0.7)
Markdown (2.6.11)
MarkupSafe (1.0)
ndg-httpsclient (0.4.4)
numpy (1.14.2)
ordereddict (1.1)
pandas (0.22.0)
pip (9.0.3)
psutil (4.4.2)
psycopg2-binary (2.7.4)
pyasn1 (0.4.2)
pycparser (2.18)
Pygments (2.2.0)
pyOpenSSL (17.5.0)
python-daemon (2.1.2)
python-dateutil (2.7.2)
python-editor (1.0.3)
python-nvd3 (0.14.2)
python-slugify (1.1.4)
pytz (2018.4)
PyYAML (3.12)
redis (2.10.6)
requests (2.18.4)
rsa (3.4.2)
s3transfer (0.1.13)
setproctitle (1.1.10)
setuptools (39.0.1)
singledispatch (3.4.0.3)
six (1.11.0)
smmap2 (2.0.3)
SQLAlchemy (1.2.6)
tabulate (0.7.7)
thrift (0.9.3)
tornado (5.0.2)
Unidecode (1.0.22)
urllib3 (1.22)
vine (1.1.4)
Werkzeug (0.14.1)
wheel (0.31.0)
WTForms (2.1)
zope.deprecation (4.3.0)

UPDATE I installed from source and am now getting this error from the webserver

[2018-04-14 00:20:48,594] {{cli.py:718}} ERROR - [0 / 0] some workers seem to have died and gunicorndid not restart them as expected
[2018-04-14 00:20:50,396] {{models.py:197}} INFO - Filling up the DagBag from /usr/local/airflow/dags
[2018-04-14 00:20:50,396] {{models.py:197}} INFO - Filling up the DagBag from /usr/local/airflow/dags
[2018-04-14 00:20:50,396] {{models.py:197}} INFO - Filling up the DagBag from /usr/local/airflow/dags
[2018-04-14 00:24:18,135] {{cli.py:725}} ERROR - No response from gunicorn master within 120 seconds
[2018-04-14 00:24:23,032] {{cli.py:726}} ERROR - Shutting down webserver

I think this a consequence of https://issues.apache.org/jira/browse/AIRFLOW-1235 which shuts down the webserver when the gunicorn workers die. I think....

UPDATE Ok this fixed itself somehow. Don't know how because I did a number of things but installing gunicorn with greenlet, eventlet, gevent might have helped and it could have been something on my entrypoint perhaps with concurrency in executing airflow webserver right after airflow initdb. Leaving the question up as I faced this with a puckel install before as well and would love to know if this is a bug others are facing and what this issue was.

like image 499
Cobman Avatar asked Apr 13 '18 20:04

Cobman


1 Answers

So, when you installed from source you got the fix for https://issues.apache.org/jira/browse/AIRFLOW-1235, which I think restarts the master and workers when the worker dies. I've also seen my workers die with the MySQL session/connection goes bad. EG an exception from SQLAlchemy either about the transaction having failed due to a concurrency lock and needing to be retried, around which Airflow models didn't have any logic, OR a InvalidRequestError: This session is in 'prepared' state; no further SQL can be emitted within this transaction. But not generally AT start up.

The two times I had errors at start up was when the connection to the database could not be made due to a security group in thing in AWS, and when our 3000+ dags took so long to get added to the DAG Bag that the timeout on the workers was getting tripped and they'd shut themselves down before the setup code was done. I would love to see if this setup code could be improved or moved out of the workers.

like image 98
dlamblin Avatar answered Sep 17 '22 17:09

dlamblin