I want to add DAG files to Airflow, which runs in Docker on Ubuntu. I used the following git repository, containing the configuration and link to docker image. When I run docker run -d -p 8080:8080 puckel/docker-airflow webserver
, everything works fin. But I can't find a way to safely add DAGs to Airflow. Alternatively, I ran docker run -d -p 8080:8080 puckel/docker-airflow webserver -v /root/dags:/usr/local/airflow/dags
, no success either.
I tried to edit the /config/airflow.cfg
and add the git credentials to a repository containing dags, but no success. Also, added a folder /dags
in home/root/dags
, containing DAGs, assuming that this folder is shared with the Docker container. But no success either.
The Docker composer file contains the following volume settings:
webserver:
image: puckel/docker-airflow:1.10.0-2
...
volumes:
- ./dags:/usr/local/airflow/dags
But when I add stuff to ./dags
in the folder from where I run the Docker container, the DAGs don't appear in Airflow.
How can I safely add DAGs to Airflow when it runs in Docker?
By default, on your airflow config you have the following line
dags_folder = /usr/local/airflow/dags
This tells airflow to load dags from that folder, in your case that path references inside the container.
Check that the database container is up and running and that airflow initdb
was executed. Airflow uses that metadata database to store the dags is loads.
Airflow scheduler loads dags every heartbeat as far as I know, so make sure you have a decent execution interval for it:
Also, in your airflow.cfg (in seconds):
scheduler_heartbeat_sec = 5
It might also be helpful to check the airflow logs inside the container for proper insights. You can run from your shell:
docker logs [container-id | container-name]
Hope this gave you some insights about your problem.
Adding a volume is the correct way
docker run -d -p 8080:8080 -v /path/to/dags/on/your/local/machine/:/usr/local/airflow/dags puckel/docker-airflow webserver
A full explanation is described in the following post by Mark Nagelberg
I've been using airflow in docker for a while and the load and reloading of code is still a bit buggy. The best solution for me is everytime I add a new dag or modify code of a dag is just to restart the whole project (docker-compose up -d --build
) so the webserver, scheduler and workers are up-to-date.
My docker + airflow worked well. Every dags added can test & run smoothly.
The approaches are: 1. expose whole volume of airflow instead of dags folder only.
webserver:
image: puckel/docker-airflow:1.10.0-2
...
volumes:
- ./airflow:/usr/local/airflow
every time, check if the dag name appeared by following command:
airflow list_dags
if not, pls double check the new added dag python file. note, above command can check the dag file immediately. the airflow web usually delay several seconds to minutes due to configuration or system loading.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With