Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the best project structure to use when developing for airflow?

I'm currently developing DAGs for Airflow. I like to use PyCharm and tend to spin up a virtual environment for each of my projects.

Airflow depends on an AIRFLOW_HOME folder that gets set during the installation. Subdirectories are then created within this folder by Airflow.

I'm interested in how others structure their projects to allow for virtual environments that contain packages (such as facebookads) that are needed for acquiring data - while also easily dropping the DAGs into Airflow's DAGS folder for testing.

like image 805
James Lloyd Avatar asked Apr 26 '18 13:04

James Lloyd


People also ask

How can you improve Airflow performance?

One can take a different approach by increasing the number of threads available on the machine that runs the scheduler process so that the max_threads parameter can be set to a higher value. With a higher value, the Airflow scheduler will be able to more effectively process the increased number of DAGs.

What is the typical journey of a task Airflow?

Ideally, a task should flow from none , to scheduled , to queued , to running , and finally to success .

Which of the following can be done using Apache airflow?

Apache Airflow is used for the scheduling and orchestration of data pipelines or workflows. Orchestration of data pipelines refers to the sequencing, coordination, scheduling, and managing complex data pipelines from diverse sources.


2 Answers

In our current case, we follow a simple structure:

 - dags
  - dag001.py
  - dag001.py
  - helpers
     - dag_001_helpers
         - file01.py
         - file02.py
     - dag_002_helpers
         - file01.py
         - file02.py
   - configs 
     - dag_001_configs
         - file11.json
         - file12.sql
     - dag_002_configs
         - file21.json
         - file22.py
like image 153
Soliman Avatar answered Oct 22 '22 05:10

Soliman


In my projects I use:

- config
  - config_1.yaml
  - config_1.env
- DAGs
  - dag_1.py
     -dag_1_etl_1.sql
     -dag_1_etl_2.sql
     -dag_1_etl_3.sql
     -dag_1_bash_1.sh
  - dag_2.py
  - dag_3.py
- operators
  - operator_1.py
  - operator_2.py
  - operator_3.py
- hooks
  - hooks_1.py

For our use case: 1) Every object that can be reused we store in a separate folder with the same kind of object;

2) Every DAG in terms of SQL must be self-contained to avoid non-mapped dependencies

like image 40
Flavio Avatar answered Oct 22 '22 05:10

Flavio