I'm trying to run airflow on an ubuntu server with systemd. I have followed quick start guide and the tutorial from the airflow documentation and I have managed to install airflow and successfully run it by using the command:
airflow webserver -p 8080
After installing systemd and a lot of trial and error with the configuration files I managed to get airflow running with the command
sudo systemctl start airflow
Airflow kept running for a week until today I restarted it with the command
sudo systemctl restart airflow
Running sudo systemctl status airflow
now gives me one of the following two messages:
● airflow.service - Airflow webserver daemon
Loaded: loaded (/lib/systemd/system/airflow.service; enabled; vendor preset: enabled)
Active: activating (auto-restart) (Result: exit-code) since Wed 2018-09-12 09:23:01 UTC; 1s ago
Process: 3115 ExecStart=/opt/miniconda3/bin/airflow webserver -p 8080 --pid /home/user/airflow/airflow-webserver.pid --daemon (code=exited, status=1/FAILURE)
Main PID: 3115 (code=exited, status=1/FAILURE)
Sep 12 09:23:01 server-service systemd[1]: airflow.service: Main process exited, code=exited, status=1/FAILURE
Sep 12 09:23:01 server-service systemd[1]: airflow.service: Unit entered failed state.
Sep 12 09:23:01 server-service systemd[1]: airflow.service: Failed with result 'exit-code'.
or
● airflow.service - Airflow webserver daemon
Loaded: loaded (/lib/systemd/system/airflow.service; enabled; vendor preset: enabled)
Active: active (running) since Wed 2018-09-12 09:23:54 UTC; 1s ago
Main PID: 3399 (airflow)
Tasks: 1
Memory: 56.1M
CPU: 1.203s
CGroup: /system.slice/airflow.service
└─3399 /opt/miniconda3/bin/python /opt/miniconda3/bin/airflow webserver -p 8080 --pid /home/user/airflow/airflow-webserver.pid --daemon
Sep 12 09:23:54 server-service systemd[1]: Stopped Airflow webserver daemon.
Sep 12 09:23:54 server-service systemd[1]: Started Airflow webserver daemon.
Sep 12 09:23:54 server-service airflow[3399]: [2018-09-12 09:23:54,372] {__init__.py:57} INFO - Using executor SequentialExecutor
Sep 12 09:23:55 server-service airflow[3399]: ____________ _____________
Sep 12 09:23:55 server-service airflow[3399]: ____ |__( )_________ __/__ /________ __
Sep 12 09:23:55 server-service airflow[3399]: ____ /| |_ /__ ___/_ /_ __ /_ __ \_ | /| / /
Sep 12 09:23:55 server-service airflow[3399]: ___ ___ | / _ / _ __/ _ / / /_/ /_ |/ |/ /
Sep 12 09:23:55 server-service airflow[3399]: _/_/ |_/_/ /_/ /_/ /_/ \____/____/|__/
Sep 12 09:23:55 server-service airflow[3399]:
Sep 12 09:23:55 server-service airflow[3399]: [2018-09-12 09:23:55,124] [3399] {models.py:167} INFO - Filling up the DagBag from /root/airflow/dags
I think the first message is returned when systemd has failed to start airflow and the second message is returned when systemd is still in the process of starting airflow.
Since the first error message contains airflow.service: Service hold-off time over, scheduling restart.
I thought I might have the this problem, but running sudo systemctl enable airflow.service
doesn't solve the problem (I think airflow.service is enabled anyways as is indicated here: Loaded: loaded (/lib/systemd/system/airflow.service; enabled; vendor preset: enabled)
).
In trying to solve the problem I found some weird things that I don't understand:
According to the airflow quick start page, running airflow manually will create a file called airflow-webserver.pid
in the airflow home, while running airflow with systemd will create a file called webserver.pid
in the /run/airflow
directory. Initially, when I tried to get airflow running with systemd I noticed that the /run/airflow/webserver.pid
was not created. Setting PIDFile=/home/user/airflow/airflow-webserver.pid
solved the problem; system ran airflow with the worker pid supplied in the airflow-webserver.pid
file. But now that I've run sudo systemctl restart airflow
that doesn't work anymore; running airflow webserver -p 8080
doesn't create the airflow-webserver.pid
that I pointed to.
Since running airflow no longer automatically creates the /run/airflow/webserver.pid
or /home/user/airflow/airflow-webserver.pid
files I tried to create them manually in the desired directories. But if I run airflow with systemd after creating the /run/airflow/webserver.pid
file, it gets removed (and not replaced) and if I run airflow manually with airflow webserver -p 8080
after creating the /run/airflow/webserver.pid
file, then that file gets removed.
My airflow.service
file looks like this:
[Unit]
Description=Airflow webserver daemon
After=network.target postgresql.service mysql.service redis.service rabbitmq-server.service
[Service]
EnvironmentFile=/etc/sysconfig/airflow
PIDFile=/home/user/airflow/airflow-webserver.pid
User=%i
Group=%i
Type=simple
ExecStart=/opt/miniconda3/bin/airflow webserver -p 8080 --pid /home/user/airflow/airflow-webserver.pid --daemon
Restart=on-failure
RestartSec=5s
PrivateTmp=true
[Install]
WantedBy=multi-user.target
Question: How do I solve these issues so that I can get airflow running with systemd?
Edit: After restarting the systemd daemon again I've managed to get airflow running (or at least it seems so). Running systemctl status airflow
returns:
● airflow.service - Airflow webserver daemon
Loaded: loaded (/lib/systemd/system/airflow.service; enabled; vendor preset: enabled)
Active: active (running) since Wed 2018-09-12 10:49:17 UTC; 6min ago
Main PID: 30054
Tasks: 0
Memory: 388.0K
CPU: 2.987s
CGroup: /system.slice/airflow.service
Sep 12 10:49:22 server-service airflow[30031]: File "/opt/miniconda3/lib/python3.6/site-packages/sqlalchemy/util/compat.py", line 203, in raise_from_cause
Sep 12 10:49:22 server-service airflow[30031]: reraise(type(exception), exception, tb=exc_tb, cause=cause)
Sep 12 10:49:22 server-service airflow[30031]: File "/opt/miniconda3/lib/python3.6/site-packages/sqlalchemy/util/compat.py", line 186, in reraise
Sep 12 10:49:22 server-service airflow[30031]: raise value.with_traceback(tb)
Sep 12 10:49:22 server-service airflow[30031]: File "/opt/miniconda3/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1182, in _execute_context
Sep 12 10:49:22 server-service airflow[30031]: context)
Sep 12 10:49:22 server-service airflow[30031]: File "/opt/miniconda3/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 470, in do_execute
Sep 12 10:49:22 server-service airflow[30031]: cursor.execute(statement, parameters)
Sep 12 10:49:22 server-service airflow[30031]: sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) no such table: connection [SQL: 'SELECT connection.conn_id AS connection_conn_id \nFROM connection G
Sep 12 10:49:23 server-service systemd[1]: airflow.service: Supervising process 30054 which is not our child. We'll most likely not notice when it exits.
lines 1-19/19 (END)
Unfortunately, I can't access airflow in my browser. Moreover, starting airflow with systemd or manually this does not produce the desire files /run/airflow/webserver.pid
and /home/user/airflow/airflow-webserver.pid
. I've tried to check whether they exist elsewhere with sudo find ~/ -type f -name "webserver.pid"
but this doesn't return anything.
I think that the message Supervising process 30054 which is not our child. We'll most likely not notice when it exits.
has something to do with my problem, since it did not get this message when airflow was running successfully with systemd in the past. Could it be that systemctl status airflow
indicates that airflow has been running for 6 min because systemd doesn't notice that the worker with pid 30054 is no longer active?
Edit 2: I have found out why the airflow-webserver.pid
"is not created" by airflow. When you run airflow webserver -p 8080
airflow does create the .pid file, but when you stop the webserver systemd deletes the .pid file again (if airflow does not do so itself). This explains why the airflow-webserver.pid
was not there, but it does not explain why the webserver.pid
is not in the /run/airflow
directory.
Edit the airflow file from systemd folder in Airflow Github as per the current configuration to set the environment variables for AIRFLOW_CONFIG , AIRFLOW_HOME & SCHEDULER . Copy the services files (the files with . service extension) to /usr/lib/systemd/system in the VM. Copy the airflow.
Create a init script and use the command "daemon" to run this as service. Show activity on this post. You can use a ready-made AMI (namely, LightningFLow) from AWS Marketplace which provides Airflow services (webserver, scheduler, worker) which are enabled at startup.
Running Airflow Locally helps Developers create workflows, schedule and maintain the tasks. Running Airflow Locally allows Developers to test and create scalable applications using Python scripts. In this article, you will learn about the need for using Airflow and the steps for Running Airflow Locally.
I know I'm digging up a slightly dated post, but I too was trying to figure out why I could not get the scheduler to run automatically when the server is running.
I did find a solution that works for me on Ubuntu 18.04 and 18.10, so hopefully this helps.
I provided a full write-up of how to install Airflow and PostgreSQL on the backend on the link here.
**from the later part of my article Essentially it comes down to making a specific change to the airflow-scheduler.system file.
This is one of the ‘gotchas’ for an implementation on Ubuntu. The dev team that created Airflow designed it to run on a different distribution of linux and therefore there is a small (but critical) change that needs to be made so that Airflow will automatically run when the server is on. The default systemd service files initially look like this:
[Unit]
Description=Airflow scheduler daemon
After=network.target postgresql.service mysql.service redis.service rabbitmq-server.service
Wants=postgresql.service mysql.service redis.service rabbitmq-server.service
[Service]
EnvironmentFile=/etc/sysconfig/airflow
User=airflow
Group=airflow
Type=simple
ExecStart=/bin/airflow scheduler
Restart=always
RestartSec=5s
[Install]
WantedBy=multi-user.target
However, this will not work as the ‘EnvironmentFile’ protocol doesn’t fly on Ubuntu 18. Instead, comment out that line and add in :
Environment="PATH=/home/ubuntu/anaconda3/envs/airflow/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
You will likely want to create a systemd service file at least for the Airflow Scheduler and also probably the Webserver if you want the UI to launch automatically as well. Indeed we do want both in this implementation, so we will be creating two files, airflow-scheduler.service & airflow-webserver.service. Both of which will be copied to the /etc/systemd/system folder. These are as follows:
[Unit]
Description=Airflow scheduler daemon
After=network.target postgresql.service mysql.service redis.service rabbitmq-server.service
Wants=postgresql.service mysql.service redis.service rabbitmq-server.service
[Service]
#EnvironmentFile=/etc/default/airflow
Environment="PATH=/home/ubuntu/anaconda3/envs/airflow/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
User=airflow
Group=airflow
Type=simple
ExecStart=/home/ubuntu/anaconda3/envs/airflow/bin/airflow scheduler
Restart=always
RestartSec=5s
[Install]
WantedBy=multi-user.target
#airflow-webserver.service
[Unit]
Description=Airflow webserver daemon
After=network.target postgresql.service mysql.service redis.service rabbitmq-server.service
Wants=postgresql.service mysql.service redis.service rabbitmq-server.service
[Service]
#EnvironmentFile=/etc/default/airflow
Environment="PATH=/home/ubuntu/anaconda3/envs/airflow/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
User=airflow
Group=airflow
Type=simple
ExecStart=/home/ubuntu/anaconda3/envs/airflow/bin/airflow webserver -p 8085 --pid /home/ubuntu/airflow/airflow-webserver.pid
Restart=on-failure
RestartSec=5s
PrivateTmp=true
[Install]
WantedBy=multi-user.target
Finally, with both of those files copied to the /etc/systemd/systemd folder by way of a superuser copy command sudo cp it is time to hit the ignition:
sudo systemctl enable airflow-scheduler sudo systemctl start airflow-scheduler sudo systemctl enable airflow-webserver sudo systemctl start airflow-webserver
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With