Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get all Airflow Leaf Nodes/Tasks

Tags:

airflow

I want to build something where I need to capture all of the leaf tasks and add a downstream dependency to them to make a job complete in our database. Is there an easy way to find all the leaf nodes of a DAG in Airflow?

like image 429
Ace Haidrey Avatar asked Apr 20 '17 21:04

Ace Haidrey


People also ask

How many tasks can Airflow handle?

You can also tune your worker_concurrency (environment variable: AIRFLOW__CELERY__WORKER_CONCURRENCY ), which determines how many tasks each Celery worker can run at any given time. By default, the Celery executor runs a maximum of sixteen tasks concurrently.

Is Start_date mandatory in Airflow DAG?

This is no longer required. Airflow will now auto align the start_date and the schedule , by using the start_date as the moment to start looking.


1 Answers

Use upstream_task_ids and downstream_task_ids @property from BaseOperator

def get_start_tasks(dag: DAG) -> List[BaseOperator]:
    # returns list of "head" / "root" tasks of DAG
    return [task for task in dag.tasks if not task.upstream_task_ids]


def get_end_tasks(dag: DAG) -> List[BaseOperator]:
    # returns list of "leaf" tasks of DAG
    return [task for task in dag.tasks if not task.downstream_task_ids]

Type-Annotations from Python 3.6+


UPDATE-1

Now Airflow DAG model has powerful @property functions like

  • leaves
  • roots
  • topological_sort
like image 163
y2k-shubham Avatar answered Oct 04 '22 02:10

y2k-shubham