The docs specify instructions for the integration
What I want is that every time the scheduler stop working it will be restarted by it's own. Usually I start it manually with airflow scheduler -D
but sometimes it stops when I'm not available.
Reading the docs I'm not sure about the configs.
The GitHub contains the following files:
airflow
airflow-scheduler.service
airflow.conf
I'm running Ubuntu 16.04
Airflow is installed on:
home/ubuntu/airflow
I have path of:
etc/systemd
The docs says to:
Copy (or link) them to /usr/lib/systemd/system
copy the airflow.conf to /etc/tmpfiles.d/
What is tmpfiles.d ?
What is # AIRFLOW_CONFIG=
in the airflow file?
Or in another words... a more "down to earth" guide on how to do it?
We first need to download the service definition files from the Apache Airflow GitHub repo, then put them into the correct system directories. We also need to create some folders because the daemon will need them to run correctly.
The Airflow scheduler is designed to run as a persistent service in an Airflow production environment. To kick it off, all you need to do is execute the airflow scheduler command. It uses the configuration specified in airflow. cfg .
The PID file for the webserver will be stored in $AIRFLOW_HOME/airflow-webserver. pid or in /run/airflow/webserver.
CLI Check for Scheduler BaseJob with information about the host and timestamp (heartbeat) at startup, and then updates it regularly. You can use this to check if the scheduler is working correctly. To do this, you can use the airflow jobs checks command. On failure, the command will exit with a non-zero error code.
Integrating Airflow with systemd files makes watching your daemons easy as systemd can take care of restarting a daemon on failure. This also enables to automatically start airflow webserver and scheduler on system start.
Edit the airflow
file from systemd
folder in Airflow Github as per the current configuration to set the environment variables for AIRFLOW_CONFIG
, AIRFLOW_HOME
& SCHEDULER
.
Copy the services files (the files with .service
extension) to /usr/lib/systemd/system
in the VM.
Copy the airflow.conf
file to /etc/tmpfiles.d/ or /usr/lib/tmpfiles.d/. Copying airflow.conf
ensures /run/airflow
is created with the right owner and permissions (0755 airflow airflow
). Check whether /run/airflow
exist with airflow:airflow
owned by airflow user and airflow group if it doesn't create /run/airflow
folder with those permissions.
Enable this services by issuing systemctl enable <service>
on command line as shown below.
sudo systemctl enable airflow-webserver
sudo systemctl enable airflow-scheduler
airflow-scheduler.service
file should be as below:
[Unit]
Description=Airflow scheduler daemon
After=network.target postgresql.service mysql.service redis.service rabbitmq-server.service
Wants=postgresql.service mysql.service redis.service rabbitmq-server.service
[Service]
EnvironmentFile=/etc/sysconfig/airflow
User=airflow
Group=airflow
Type=simple
ExecStart=/bin/airflow scheduler
Restart=always
RestartSec=5s
[Install]
WantedBy=multi-user.target
Your question dates a little, but I just discovered it, because I'm interested at the moment in the same subject. I think the answer to your question is here.
https://medium.com/@shahbaz.ali03/run-apache-airflow-as-a-service-on-ubuntu-18-04-server-b637c03f4722
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With