I have some dags that can't seem to locate python modules. Inside of the Airflow UI, I see a ton of these message variations.
Broken DAG: [/home/airflow/source/airflow/dags/test.py] No module named 'paramiko'
Inside of a file I can directly modify the python sys.path and that seems to mitigate my issue.
import sys
sys.path.append('/home/airflow/.local/lib/python2.7/site-packages')
That doesn't feel right though having to set my path in my code directly. I've tried exporting PYTHONPATH in the Airflow user accounts .bashrc but doesn't seem to be read when the dag jobs are executed. What's the correct way to go about this?
Thanks.
----- update -----
Thanks for the responses.
below is my systemctl scripts.
::::::::::::::
airflow-scheduler-airflow2.service
::::::::::::::
[Unit]
Description=Airflow scheduler daemon
[Service]
EnvironmentFile=/usr/local/airflow/instances/airflow2/etc/envars
User=airflow2
Group=airflow2
Type=simple
ExecStart=/usr/local/airflow/instances/airflow2/venv/bin/airflow scheduler
Restart=always
RestartSec=5s
[Install]
WantedBy=multi-user.target
::::::::::::::
airflow-webserver-airflow2.service
::::::::::::::
[Unit]
Description=Airflow webserver daemon
[Service]
EnvironmentFile=/usr/local/airflow/instances/airflow2/etc/envars
User=airflow2
Group=airflow2
Type=simple
ExecStart=/usr/local/airflow/instances/airflow2/venv/bin/airflow webserver
Restart=always
RestartSec=5s
[Install]
WantedBy=multi-user.target
this is the EnvironentFile Contents uses from above
more /usr/local/airflow/instances/airflow2/etc/envars
PATH=/usr/local/airflow/instances/airflow2/venv/bin:/usr/local/bin:/usr/bin:/bin
AIRFLOW_HOME=/usr/local/airflow/instances/airflow2/home
AIRFLOW_CONFIG=/usr/local/airflow/instances/airflow2/etc/airflow.cfg
In Airflow, a DAG – or a Directed Acyclic Graph – is a collection of all the tasks you want to run, organized in a way that reflects their relationships and dependencies. A DAG is defined in a Python script, which represents the DAGs structure (tasks and their dependencies) as code.
Airflow loads DAGs from Python source files, which it looks for inside its configured DAG_FOLDER . It will take each file, execute it, and then load any DAG objects from that file. This means you can define multiple DAGs per Python file, or even spread one very complex DAG across multiple Python files using imports.
Airflow looks in your DAGS_FOLDER for modules that contain DAG objects in their global namespace and adds the objects it finds in the DagBag .
The execution of the DAG depends on its containing tasks and their dependencies. The status is assigned to the DAG Run when all of the tasks are in the one of the terminal states (i.e. if there is no possible transition to another state) like success , failed or skipped .
I had similar issue:
dags
folder I added following strings to the environemnt file for systemd service
(/usr/local/airflow/instances/airflow2/etc/envars
in your case)
source /home/ubuntu/venv/airflow/bin/activate
PYTHONPATH=/home/ubuntu/venv/airflow/dags:$PYTHONPATH
It looks like your python environment is degraded - you have multiple instances of python on your vm (python 3.6 and python 2.7) and multiple instances of pip. There is a pip with python3.6 that is trying to be used, but all of your modules are actually with your python 2.7.
This can be solved easily by using symbolic links to redirect to 2.7.
Type the commands and see which instance of python is used (2.7.5, 2.7.14, 3.6, etc):
python
python2
python2.7
or type which python
to find which python instance is being used by your vm. You can also do which pip
to see what pip instance is being used.
I am going to assume python
and which python
leads to python 3 (which you do not want to use), but python2
and python2.7
lead to the instance you do want to use.
To create a symbolic link so that /home/airflow/.local/lib/python2.7/
is used, do the following and create the following symbolic links:
cd home/airflow/.local/lib/python2.7
ln -s python2 python
ln -s /home/airflow/.local/lib/python2.7 python2
Symbolic link structure is: ln -s #PATHDIRECTED #LINKNAME
You are essentially saying when you run the command python
, go to python2
. When python2
is then ran, go to /home/airflow/.local/lib/python2.7
. Its all being redirected.
Now re run the three commands above (python, python2, python2.7). All should lead to the python instance you want.
Hope this helps!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With