Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Airflow Dag Folder - How to ignore notebook checkpoints

Airflow is being too clever and trying to pick up dags within the jupyter notebook checkpoints folder "dags/.ipynb_checkpoints/" which is throwing an error.

Is there a way to config airflow to ignore folders of a certain pattern? like I would .gitignore?

Thanks

like image 548
Glenn Sampson Avatar asked Dec 06 '18 22:12

Glenn Sampson


1 Answers

You can create .airflowignore in dags folder:

.ipynb_checkpoints

From the docs:

A .airflowignore file specifies the directories or files in DAG_FOLDER that Airflow should intentionally ignore. Each line in .airflowignore specifies a regular expression pattern, and directories or files whose names (not DAG id) match any of the patterns would be ignored (under the hood, re.findall() is used to match the pattern). Overall it works like a .gitignore file.

.airflowignore file should be put in your DAG_FOLDER. For example, you can prepare a .airflowignore file with contents

project_a
tenant_[\d]

Then files like project_a_dag_1.py, TESTING_project_a.py, tenant_1.py, project_a/dag_1.py, and tenant_1/dag_1.py in your DAG_FOLDER would be ignored (If a directory’s name matches any of the patterns, this directory and all its subfolders would not be scanned by Airflow at all. This improves efficiency of DAG finding).

The scope of a .airflowignore file is the directory it is in plus all its subfolders. You can also prepare .airflowignore file for a subfolder in DAG_FOLDER and it would only be applicable for that subfolder.

like image 89
kaxil Avatar answered Sep 20 '22 14:09

kaxil