Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Trying to run apache airflow on ubuntu server with systemd

I'm trying to run airflow on an ubuntu server with systemd. I have followed quick start guide and the tutorial from the airflow documentation and I have managed to install airflow and successfully run it by using the command:

airflow webserver -p 8080

After installing systemd and a lot of trial and error with the configuration files I managed to get airflow running with the command

sudo systemctl start airflow

Airflow kept running for a week until today I restarted it with the command

sudo systemctl restart airflow

Running sudo systemctl status airflow now gives me one of the following two messages:

● airflow.service - Airflow webserver daemon
 Loaded: loaded (/lib/systemd/system/airflow.service; enabled; vendor preset: enabled)
 Active: activating (auto-restart) (Result: exit-code) since Wed 2018-09-12 09:23:01 UTC; 1s ago
Process: 3115 ExecStart=/opt/miniconda3/bin/airflow webserver -p 8080 --pid /home/user/airflow/airflow-webserver.pid --daemon (code=exited, status=1/FAILURE)
Main PID: 3115 (code=exited, status=1/FAILURE)

Sep 12 09:23:01 server-service systemd[1]: airflow.service: Main process exited, code=exited, status=1/FAILURE
Sep 12 09:23:01 server-service systemd[1]: airflow.service: Unit entered failed state.
Sep 12 09:23:01 server-service systemd[1]: airflow.service: Failed with result 'exit-code'.

or

● airflow.service - Airflow webserver daemon
 Loaded: loaded (/lib/systemd/system/airflow.service; enabled; vendor preset: enabled)
 Active: active (running) since Wed 2018-09-12 09:23:54 UTC; 1s ago
Main PID: 3399 (airflow)
  Tasks: 1
 Memory: 56.1M
    CPU: 1.203s
 CGroup: /system.slice/airflow.service
         └─3399 /opt/miniconda3/bin/python /opt/miniconda3/bin/airflow webserver -p 8080 --pid /home/user/airflow/airflow-webserver.pid --daemon

Sep 12 09:23:54 server-service systemd[1]: Stopped Airflow webserver daemon.
Sep 12 09:23:54 server-service systemd[1]: Started Airflow webserver daemon.
Sep 12 09:23:54 server-service airflow[3399]: [2018-09-12 09:23:54,372] {__init__.py:57} INFO - Using executor SequentialExecutor
Sep 12 09:23:55 server-service airflow[3399]:   ____________       _____________
Sep 12 09:23:55 server-service airflow[3399]:  ____    |__( )_________  __/__  /________      __
Sep 12 09:23:55 server-service airflow[3399]: ____  /| |_  /__  ___/_  /_ __  /_  __ \_ | /| / /
Sep 12 09:23:55 server-service airflow[3399]: ___  ___ |  / _  /   _  __/ _  / / /_/ /_ |/ |/ /
Sep 12 09:23:55 server-service airflow[3399]:  _/_/  |_/_/  /_/    /_/    /_/  \____/____/|__/
Sep 12 09:23:55 server-service airflow[3399]:  
Sep 12 09:23:55 server-service airflow[3399]: [2018-09-12 09:23:55,124] [3399] {models.py:167} INFO - Filling up the DagBag from /root/airflow/dags

I think the first message is returned when systemd has failed to start airflow and the second message is returned when systemd is still in the process of starting airflow.

Since the first error message contains airflow.service: Service hold-off time over, scheduling restart. I thought I might have the this problem, but running sudo systemctl enable airflow.service doesn't solve the problem (I think airflow.service is enabled anyways as is indicated here: Loaded: loaded (/lib/systemd/system/airflow.service; enabled; vendor preset: enabled)).

In trying to solve the problem I found some weird things that I don't understand:

  • According to the airflow quick start page, running airflow manually will create a file called airflow-webserver.pid in the airflow home, while running airflow with systemd will create a file called webserver.pid in the /run/airflow directory. Initially, when I tried to get airflow running with systemd I noticed that the /run/airflow/webserver.pid was not created. Setting PIDFile=/home/user/airflow/airflow-webserver.pid solved the problem; system ran airflow with the worker pid supplied in the airflow-webserver.pid file. But now that I've run sudo systemctl restart airflow that doesn't work anymore; running airflow webserver -p 8080 doesn't create the airflow-webserver.pid that I pointed to.

  • Since running airflow no longer automatically creates the /run/airflow/webserver.pid or /home/user/airflow/airflow-webserver.pid files I tried to create them manually in the desired directories. But if I run airflow with systemd after creating the /run/airflow/webserver.pid file, it gets removed (and not replaced) and if I run airflow manually with airflow webserver -p 8080 after creating the /run/airflow/webserver.pid file, then that file gets removed.

My airflow.service file looks like this:

[Unit]
Description=Airflow webserver daemon
After=network.target postgresql.service mysql.service redis.service rabbitmq-server.service

[Service]
EnvironmentFile=/etc/sysconfig/airflow
PIDFile=/home/user/airflow/airflow-webserver.pid
User=%i
Group=%i
Type=simple
ExecStart=/opt/miniconda3/bin/airflow webserver -p 8080 --pid /home/user/airflow/airflow-webserver.pid --daemon

Restart=on-failure
RestartSec=5s
PrivateTmp=true

[Install]
WantedBy=multi-user.target

Question: How do I solve these issues so that I can get airflow running with systemd?

Edit: After restarting the systemd daemon again I've managed to get airflow running (or at least it seems so). Running systemctl status airflow returns:

● airflow.service - Airflow webserver daemon
   Loaded: loaded (/lib/systemd/system/airflow.service; enabled; vendor preset: enabled)
   Active: active (running) since Wed 2018-09-12 10:49:17 UTC; 6min ago
 Main PID: 30054
    Tasks: 0
   Memory: 388.0K
      CPU: 2.987s
   CGroup: /system.slice/airflow.service

Sep 12 10:49:22 server-service airflow[30031]:   File "/opt/miniconda3/lib/python3.6/site-packages/sqlalchemy/util/compat.py", line 203, in raise_from_cause
Sep 12 10:49:22 server-service airflow[30031]:     reraise(type(exception), exception, tb=exc_tb, cause=cause)
Sep 12 10:49:22 server-service airflow[30031]:   File "/opt/miniconda3/lib/python3.6/site-packages/sqlalchemy/util/compat.py", line 186, in reraise
Sep 12 10:49:22 server-service airflow[30031]:     raise value.with_traceback(tb)
Sep 12 10:49:22 server-service airflow[30031]:   File "/opt/miniconda3/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1182, in _execute_context
Sep 12 10:49:22 server-service airflow[30031]:     context)
Sep 12 10:49:22 server-service airflow[30031]:   File "/opt/miniconda3/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 470, in do_execute
Sep 12 10:49:22 server-service airflow[30031]:     cursor.execute(statement, parameters)
Sep 12 10:49:22 server-service airflow[30031]: sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) no such table: connection [SQL: 'SELECT connection.conn_id AS connection_conn_id \nFROM connection G
Sep 12 10:49:23 server-service systemd[1]: airflow.service: Supervising process 30054 which is not our child. We'll most likely not notice when it exits.
lines 1-19/19 (END)

Unfortunately, I can't access airflow in my browser. Moreover, starting airflow with systemd or manually this does not produce the desire files /run/airflow/webserver.pid and /home/user/airflow/airflow-webserver.pid. I've tried to check whether they exist elsewhere with sudo find ~/ -type f -name "webserver.pid" but this doesn't return anything.

I think that the message Supervising process 30054 which is not our child. We'll most likely not notice when it exits. has something to do with my problem, since it did not get this message when airflow was running successfully with systemd in the past. Could it be that systemctl status airflow indicates that airflow has been running for 6 min because systemd doesn't notice that the worker with pid 30054 is no longer active?

Edit 2: I have found out why the airflow-webserver.pid "is not created" by airflow. When you run airflow webserver -p 8080 airflow does create the .pid file, but when you stop the webserver systemd deletes the .pid file again (if airflow does not do so itself). This explains why the airflow-webserver.pid was not there, but it does not explain why the webserver.pid is not in the /run/airflow directory.

like image 656
Mr. President Avatar asked Sep 12 '18 10:09

Mr. President


People also ask

How do I run Apache airflow as daemon using Linux Systemd?

Edit the airflow file from systemd folder in Airflow Github as per the current configuration to set the environment variables for AIRFLOW_CONFIG , AIRFLOW_HOME & SCHEDULER . Copy the services files (the files with . service extension) to /usr/lib/systemd/system in the VM. Copy the airflow.

How do I start Airflow on webserver?

Create a init script and use the command "daemon" to run this as service. Show activity on this post. You can use a ready-made AMI (namely, LightningFLow) from AWS Marketplace which provides Airflow services (webserver, scheduler, worker) which are enabled at startup.

Can Airflow run locally?

Running Airflow Locally helps Developers create workflows, schedule and maintain the tasks. Running Airflow Locally allows Developers to test and create scalable applications using Python scripts. In this article, you will learn about the need for using Airflow and the steps for Running Airflow Locally.


1 Answers

I know I'm digging up a slightly dated post, but I too was trying to figure out why I could not get the scheduler to run automatically when the server is running.

I did find a solution that works for me on Ubuntu 18.04 and 18.10, so hopefully this helps.

I provided a full write-up of how to install Airflow and PostgreSQL on the backend on the link here.

**from the later part of my article Essentially it comes down to making a specific change to the airflow-scheduler.system file.

This is one of the ‘gotchas’ for an implementation on Ubuntu. The dev team that created Airflow designed it to run on a different distribution of linux and therefore there is a small (but critical) change that needs to be made so that Airflow will automatically run when the server is on. The default systemd service files initially look like this:

[Unit]
Description=Airflow scheduler daemon
After=network.target postgresql.service mysql.service redis.service rabbitmq-server.service
Wants=postgresql.service mysql.service redis.service rabbitmq-server.service

[Service]
EnvironmentFile=/etc/sysconfig/airflow
User=airflow
Group=airflow
Type=simple
ExecStart=/bin/airflow scheduler
Restart=always
RestartSec=5s

[Install]
WantedBy=multi-user.target

However, this will not work as the ‘EnvironmentFile’ protocol doesn’t fly on Ubuntu 18. Instead, comment out that line and add in :

Environment="PATH=/home/ubuntu/anaconda3/envs/airflow/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"

You will likely want to create a systemd service file at least for the Airflow Scheduler and also probably the Webserver if you want the UI to launch automatically as well. Indeed we do want both in this implementation, so we will be creating two files, airflow-scheduler.service & airflow-webserver.service. Both of which will be copied to the /etc/systemd/system folder. These are as follows:


airflow-scheduler.service

[Unit]
Description=Airflow scheduler daemon
After=network.target postgresql.service mysql.service redis.service rabbitmq-server.service
Wants=postgresql.service mysql.service redis.service rabbitmq-server.service

[Service]
#EnvironmentFile=/etc/default/airflow
Environment="PATH=/home/ubuntu/anaconda3/envs/airflow/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
User=airflow
Group=airflow
Type=simple
ExecStart=/home/ubuntu/anaconda3/envs/airflow/bin/airflow scheduler
Restart=always
RestartSec=5s

[Install]
WantedBy=multi-user.target
#airflow-webserver.service

airflow-webserver.service

[Unit]
Description=Airflow webserver daemon
After=network.target postgresql.service mysql.service redis.service rabbitmq-server.service
Wants=postgresql.service mysql.service redis.service rabbitmq-server.service

[Service]
#EnvironmentFile=/etc/default/airflow
Environment="PATH=/home/ubuntu/anaconda3/envs/airflow/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
User=airflow
Group=airflow
Type=simple
ExecStart=/home/ubuntu/anaconda3/envs/airflow/bin/airflow webserver -p 8085 --pid /home/ubuntu/airflow/airflow-webserver.pid
Restart=on-failure
RestartSec=5s
PrivateTmp=true

[Install]
WantedBy=multi-user.target

Finally, with both of those files copied to the /etc/systemd/systemd folder by way of a superuser copy command sudo cp it is time to hit the ignition:

sudo systemctl enable airflow-scheduler sudo systemctl start airflow-scheduler sudo systemctl enable airflow-webserver sudo systemctl start airflow-webserver

like image 179
Merlin Avatar answered Sep 21 '22 07:09

Merlin