Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Airflow in Docker: how to add DAGs to Airflow?

I want to add DAG files to Airflow, which runs in Docker on Ubuntu. I used the following git repository, containing the configuration and link to docker image. When I run docker run -d -p 8080:8080 puckel/docker-airflow webserver, everything works fin. But I can't find a way to safely add DAGs to Airflow. Alternatively, I ran docker run -d -p 8080:8080 puckel/docker-airflow webserver -v /root/dags:/usr/local/airflow/dags, no success either.

I tried to edit the /config/airflow.cfg and add the git credentials to a repository containing dags, but no success. Also, added a folder /dags in home/root/dags, containing DAGs, assuming that this folder is shared with the Docker container. But no success either.

The Docker composer file contains the following volume settings:

webserver:
        image: puckel/docker-airflow:1.10.0-2
        ...
        volumes:
            - ./dags:/usr/local/airflow/dags 

But when I add stuff to ./dags in the folder from where I run the Docker container, the DAGs don't appear in Airflow.

How can I safely add DAGs to Airflow when it runs in Docker?

like image 310
Dendrobates Avatar asked Sep 16 '18 20:09

Dendrobates


4 Answers

By default, on your airflow config you have the following line

dags_folder = /usr/local/airflow/dags

This tells airflow to load dags from that folder, in your case that path references inside the container.

Check that the database container is up and running and that airflow initdb was executed. Airflow uses that metadata database to store the dags is loads.

Airflow scheduler loads dags every heartbeat as far as I know, so make sure you have a decent execution interval for it:

Also, in your airflow.cfg (in seconds):

scheduler_heartbeat_sec = 5

It might also be helpful to check the airflow logs inside the container for proper insights. You can run from your shell:

docker logs [container-id | container-name]

Hope this gave you some insights about your problem.

like image 57
JaviOverflow Avatar answered Oct 20 '22 07:10

JaviOverflow


Adding a volume is the correct way

docker run -d -p 8080:8080 -v /path/to/dags/on/your/local/machine/:/usr/local/airflow/dags  puckel/docker-airflow webserver

A full explanation is described in the following post by Mark Nagelberg

like image 28
skibee Avatar answered Oct 20 '22 08:10

skibee


I've been using airflow in docker for a while and the load and reloading of code is still a bit buggy. The best solution for me is everytime I add a new dag or modify code of a dag is just to restart the whole project (docker-compose up -d --build) so the webserver, scheduler and workers are up-to-date.

like image 2
pacuna Avatar answered Oct 20 '22 06:10

pacuna


My docker + airflow worked well. Every dags added can test & run smoothly.

The approaches are: 1. expose whole volume of airflow instead of dags folder only.

webserver:
        image: puckel/docker-airflow:1.10.0-2
        ...
        volumes:
            - ./airflow:/usr/local/airflow
  1. edit the dags folder configuration in the airflow configuration file(it do not needs edit by default, as it is under the airflow folder)
  2. every time, check if the dag name appeared by following command:

    airflow list_dags

if not, pls double check the new added dag python file. note, above command can check the dag file immediately. the airflow web usually delay several seconds to minutes due to configuration or system loading.

like image 1
Yong Wang Avatar answered Oct 20 '22 08:10

Yong Wang