I want to build something where I need to capture all of the leaf tasks and add a downstream dependency to them to make a job complete in our database. Is there an easy way to find all the leaf nodes of a DAG in Airflow?
You can also tune your worker_concurrency (environment variable: AIRFLOW__CELERY__WORKER_CONCURRENCY ), which determines how many tasks each Celery worker can run at any given time. By default, the Celery executor runs a maximum of sixteen tasks concurrently.
This is no longer required. Airflow will now auto align the start_date and the schedule , by using the start_date as the moment to start looking.
Use upstream_task_ids
and downstream_task_ids
@property
from BaseOperator
def get_start_tasks(dag: DAG) -> List[BaseOperator]:
# returns list of "head" / "root" tasks of DAG
return [task for task in dag.tasks if not task.upstream_task_ids]
def get_end_tasks(dag: DAG) -> List[BaseOperator]:
# returns list of "leaf" tasks of DAG
return [task for task in dag.tasks if not task.downstream_task_ids]
Type-Annotations
from Python 3.6+
UPDATE-1
Now Airflow DAG
model has powerful @property
functions like
leaves
roots
topological_sort
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With