Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Airflow dags and PYTHONPATH

I have some dags that can't seem to locate python modules. Inside of the Airflow UI, I see a ton of these message variations.

Broken DAG: [/home/airflow/source/airflow/dags/test.py] No module named 'paramiko'

Inside of a file I can directly modify the python sys.path and that seems to mitigate my issue. import sys sys.path.append('/home/airflow/.local/lib/python2.7/site-packages')

That doesn't feel right though having to set my path in my code directly. I've tried exporting PYTHONPATH in the Airflow user accounts .bashrc but doesn't seem to be read when the dag jobs are executed. What's the correct way to go about this?

Thanks.

----- update -----

Thanks for the responses.

below is my systemctl scripts.

::::::::::::::
airflow-scheduler-airflow2.service
::::::::::::::
[Unit]
Description=Airflow scheduler daemon

[Service]
EnvironmentFile=/usr/local/airflow/instances/airflow2/etc/envars
User=airflow2
Group=airflow2
Type=simple
ExecStart=/usr/local/airflow/instances/airflow2/venv/bin/airflow scheduler
Restart=always
RestartSec=5s

[Install]
WantedBy=multi-user.target
::::::::::::::
airflow-webserver-airflow2.service
::::::::::::::
[Unit]
Description=Airflow webserver daemon

[Service]
EnvironmentFile=/usr/local/airflow/instances/airflow2/etc/envars
User=airflow2
Group=airflow2
Type=simple
ExecStart=/usr/local/airflow/instances/airflow2/venv/bin/airflow webserver
Restart=always
RestartSec=5s

[Install]
WantedBy=multi-user.target

this is the EnvironentFile Contents uses from above

more /usr/local/airflow/instances/airflow2/etc/envars
PATH=/usr/local/airflow/instances/airflow2/venv/bin:/usr/local/bin:/usr/bin:/bin
AIRFLOW_HOME=/usr/local/airflow/instances/airflow2/home
AIRFLOW_CONFIG=/usr/local/airflow/instances/airflow2/etc/airflow.cfg
like image 204
sebastian Avatar asked Jun 06 '18 02:06

sebastian


People also ask

What are DAGs in Airflow?

In Airflow, a DAG – or a Directed Acyclic Graph – is a collection of all the tasks you want to run, organized in a way that reflects their relationships and dependencies. A DAG is defined in a Python script, which represents the DAGs structure (tasks and their dependencies) as code.

How does Airflow import DAGs?

Airflow loads DAGs from Python source files, which it looks for inside its configured DAG_FOLDER . It will take each file, execute it, and then load any DAG objects from that file. This means you can define multiple DAGs per Python file, or even spread one very complex DAG across multiple Python files using imports.

Where does Airflow look for DAGs?

Airflow looks in your DAGS_FOLDER for modules that contain DAG objects in their global namespace and adds the objects it finds in the DagBag .

How is DAG executed in Airflow?

The execution of the DAG depends on its containing tasks and their dependencies. The status is assigned to the DAG Run when all of the tasks are in the one of the terminal states (i.e. if there is no possible transition to another state) like success , failed or skipped .


2 Answers

I had similar issue:

  1. Python wasn't loaded from virtualenv for running airflow (this fixed airflow deps not being fetched from virtualenv)
  2. Submodules under dags path wasn't loaded due different base path (this fixed importing own modules under dags folder

I added following strings to the environemnt file for systemd service (/usr/local/airflow/instances/airflow2/etc/envars in your case)

source /home/ubuntu/venv/airflow/bin/activate
PYTHONPATH=/home/ubuntu/venv/airflow/dags:$PYTHONPATH
like image 63
Andrey Avatar answered Nov 14 '22 20:11

Andrey


It looks like your python environment is degraded - you have multiple instances of python on your vm (python 3.6 and python 2.7) and multiple instances of pip. There is a pip with python3.6 that is trying to be used, but all of your modules are actually with your python 2.7.

This can be solved easily by using symbolic links to redirect to 2.7.

Type the commands and see which instance of python is used (2.7.5, 2.7.14, 3.6, etc):

  1. python
  2. python2
  3. python2.7

or type which python to find which python instance is being used by your vm. You can also do which pip to see what pip instance is being used.

I am going to assume python and which python leads to python 3 (which you do not want to use), but python2 and python2.7 lead to the instance you do want to use.

To create a symbolic link so that /home/airflow/.local/lib/python2.7/ is used, do the following and create the following symbolic links:

  1. cd home/airflow/.local/lib/python2.7
  2. ln -s python2 python
  3. ln -s /home/airflow/.local/lib/python2.7 python2

Symbolic link structure is: ln -s #PATHDIRECTED #LINKNAME You are essentially saying when you run the command python, go to python2. When python2 is then ran, go to /home/airflow/.local/lib/python2.7. Its all being redirected.

Now re run the three commands above (python, python2, python2.7). All should lead to the python instance you want.

Hope this helps!

like image 31
Zack Avatar answered Nov 14 '22 19:11

Zack