My airflow webserver suddenly stopped starting. When I try to start webserver it does not come up with UI.
I tried reseting db as airflow resetdb
and airflow initdb
restarting all the services. Downgrading Gunicorn and upgrading it again. Restarting my linux machine, however, nothing has changed.
Logs of webserver is following:
[2019-05-17 08:08:00 +0000] [14978] [INFO] Starting gunicorn 19.9.0
[2019-05-17 08:08:00 +0000] [14978] [INFO] Listening at: http://0.0.0.0:8081 (14978)
[2019-05-17 08:08:00 +0000] [14978] [INFO] Using worker: sync
[2019-05-17 08:08:00 +0000] [14983] [INFO] Booting worker with pid: 14983
[2019-05-17 08:08:00 +0000] [14984] [INFO] Booting worker with pid: 14984
[2019-05-17 08:08:00 +0000] [14985] [INFO] Booting worker with pid: 14985
[2019-05-17 08:08:00 +0000] [14986] [INFO] Booting worker with pid: 14986
[2019-05-17 08:08:02,179] {__init__.py:51} INFO - Using executor LocalExecutor
[2019-05-17 08:08:02,279] {__init__.py:51} INFO - Using executor LocalExecutor
[2019-05-17 08:08:02,324] {__init__.py:51} INFO - Using executor LocalExecutor
[2019-05-17 08:08:02,342] {models.py:273} INFO - Filling up the DagBag from /root/airflow/dags
[2019-05-17 08:08:02,376] {__init__.py:51} INFO - Using executor LocalExecutor
[2019-05-17 08:08:02,435] {models.py:273} INFO - Filling up the DagBag from /root/airflow/dags
[2019-05-17 08:08:02,521] {models.py:273} INFO - Filling up the DagBag from /root/airflow/dags
[2019-05-17 08:08:02,524] {models.py:273} INFO - Filling up the DagBag from /root/airflow/dags
[2019-05-17 08:10:00 +0000] [14978] [CRITICAL] WORKER TIMEOUT (pid:14984)
[2019-05-17 08:10:00 +0000] [14978] [CRITICAL] WORKER TIMEOUT (pid:14985)
[2019-05-17 08:10:00 +0000] [14978] [CRITICAL] WORKER TIMEOUT (pid:14986)
[2019-05-17 08:10:00 +0000] [14978] [CRITICAL] WORKER TIMEOUT (pid:14983)
[2019-05-17 08:10:01 +0000] [15161] [INFO] Booting worker with pid: 15161
[2019-05-17 08:10:01 +0000] [15164] [INFO] Booting worker with pid: 15164
[2019-05-17 08:10:01 +0000] [15167] [INFO] Booting worker with pid: 15167
[2019-05-17 08:10:01 +0000] [15168] [INFO] Booting worker with pid: 15168
[2019-05-17 08:10:03,953] {__init__.py:51} INFO - Using executor LocalExecutor
[2019-05-17 08:10:04,007] {__init__.py:51} INFO - Using executor LocalExecutor
[2019-05-17 08:10:04,020] {__init__.py:51} INFO - Using executor LocalExecutor
[2019-05-17 08:10:04,036] {__init__.py:51} INFO - Using executor LocalExecutor
Is there anyone who encountered same problem? or Do you have any suggestions?
Webserver Health Check Endpoint To check the health status of your Airflow instance, you can simply access the endpoint /health . It will return a JSON object in which a high-level glance is provided.
The service logs are available at /media/ephemeral0/logs/airflow location inside the cluster node. Since airflow is single node machine, logs are accessible on the same node. These logs are helpful in troubleshooting cluster bringup and scheduling issues.
You can do start/stop/restart actions on an Airflow service and the commands used for each service are given below: Run sudo monit <action> scheduler for Airflow Scheduler. Run sudo monit <action> webserver for Airflow Webserver. Run sudo monit <action> worker for Celery workers.
Run airflow dags list with the Airflow CLI to make sure that Airflow has registered the DAG in the metastore. If the DAG appears in the list, try restarting the webserver. Try restarting the scheduler. If you're using the Astro CLI, run astro dev restart .
Since Airflow 2.0, the default UI is the Flask App Builder RBAC. A webserver_config.py configuration file is automatically generated and can be used to configure the Airflow to support authentication methods like OAuth, OpenID, LDAP, REMOTE_USER.
I faced the same issue today, airflow webserver stopped starting. I tried a lot but was not able to determine the cause of the issue nothing worked neither resetdb nor upgradedb also reinstalling didn't work. Then I simply commented the whole code inside of my dags and manually created a .pyc file of the dags in dag folder. airflow started working again. I observed that the issue was with the dags. when I removed the dags server started functioning normally. so my advice to anyone who is facing this issue is please check your dags there is definitely something wrong within them. don't blame airflow, sometimes our own code messes with the system.
This is a possible solution that worked for me.
Make sure the dags_folder
doesn't contain any files that are not relevant to your dags definitions and configurations.
The Airflow webserver scans periodically the dag_folder
, and I found that if this folder is very large the scans causes the server to stall.
Hope this helps you :)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With