Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use Airflow scheduler with systemd?

Tags:

airflow

The docs specify instructions for the integration

What I want is that every time the scheduler stop working it will be restarted by it's own. Usually I start it manually with airflow scheduler -D but sometimes it stops when I'm not available.

Reading the docs I'm not sure about the configs.

The GitHub contains the following files:

airflow
airflow-scheduler.service
airflow.conf

I'm running Ubuntu 16.04

Airflow is installed on:

home/ubuntu/airflow

I have path of:

etc/systemd

The docs says to:

Copy (or link) them to /usr/lib/systemd/system

  1. Copy which of the files?

copy the airflow.conf to /etc/tmpfiles.d/

  1. What is tmpfiles.d ?

  2. What is # AIRFLOW_CONFIG= in the airflow file?

Or in another words... a more "down to earth" guide on how to do it?

like image 795
Programmer120 Avatar asked Oct 23 '18 12:10

Programmer120


People also ask

How do I run Apache airflow as daemon using Linux Systemd?

We first need to download the service definition files from the Apache Airflow GitHub repo, then put them into the correct system directories. We also need to create some folders because the daemon will need them to run correctly.

How do I start an airflow scheduler?

The Airflow scheduler is designed to run as a persistent service in an Airflow production environment. To kick it off, all you need to do is execute the airflow scheduler command. It uses the configuration specified in airflow. cfg .

Where is airflow scheduler PID?

The PID file for the webserver will be stored in $AIRFLOW_HOME/airflow-webserver. pid or in /run/airflow/webserver.

How do I know if the airflow scheduler is running?

CLI Check for Scheduler BaseJob with information about the host and timestamp (heartbeat) at startup, and then updates it regularly. You can use this to check if the scheduler is working correctly. To do this, you can use the airflow jobs checks command. On failure, the command will exit with a non-zero error code.


2 Answers

Integrating Airflow with systemd files makes watching your daemons easy as systemd can take care of restarting a daemon on failure. This also enables to automatically start airflow webserver and scheduler on system start.

Edit the airflow file from systemd folder in Airflow Github as per the current configuration to set the environment variables for AIRFLOW_CONFIG, AIRFLOW_HOME & SCHEDULER.

Copy the services files (the files with .service extension) to /usr/lib/systemd/system in the VM.

Copy the airflow.conf file to /etc/tmpfiles.d/ or /usr/lib/tmpfiles.d/. Copying airflow.conf ensures /run/airflow is created with the right owner and permissions (0755 airflow airflow). Check whether /run/airflow exist with airflow:airflow owned by airflow user and airflow group if it doesn't create /run/airflowfolder with those permissions.

Enable this services by issuing systemctl enable <service> on command line as shown below.

sudo systemctl enable airflow-webserver
sudo systemctl enable airflow-scheduler

airflow-scheduler.service file should be as below:

[Unit]
Description=Airflow scheduler daemon
After=network.target postgresql.service mysql.service redis.service rabbitmq-server.service
Wants=postgresql.service mysql.service redis.service rabbitmq-server.service

[Service]
EnvironmentFile=/etc/sysconfig/airflow
User=airflow
Group=airflow
Type=simple
ExecStart=/bin/airflow scheduler
Restart=always
RestartSec=5s

[Install]
WantedBy=multi-user.target
like image 108
kaxil Avatar answered Oct 12 '22 13:10

kaxil


Your question dates a little, but I just discovered it, because I'm interested at the moment in the same subject. I think the answer to your question is here.

https://medium.com/@shahbaz.ali03/run-apache-airflow-as-a-service-on-ubuntu-18-04-server-b637c03f4722

like image 45
cherah30 Avatar answered Oct 12 '22 15:10

cherah30