Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the difference between airflow trigger rule "all_done" and "all_success"?

Tags:

One of the requirement in the workflow I am working on is to wait for some event to happen for given time, if it does not happen mark the task as failed still the downstream task should be executed.

I am wondering if "all_done" means all the dependency tasks are done no matter if they have succeeded or not.

like image 350
samarth Avatar asked Jan 16 '17 05:01

samarth


People also ask

What are the trigger rules in Airflow?

Airflow's default trigger rule is “all_success”, which states that all of a task's dependencies must have completed successfully before the task itself can be executed.

Which is the default trigger rule?

Trigger Rules The default value for trigger_rule is all_success and can be defined as “trigger this task when all directly upstream tasks have succeeded”.

What is trigger rule?

Basically, a trigger rule defines why a task gets triggered, on which condition. By default, all tasks have the same trigger rule all_success set which means, if all parents of a task succeed, then the task gets triggered. Only one trigger rule at a time can be specified for a given task.

What does trigger DAG do in Airflow?

When you trigger a DAG manually, Airflow performs a DAG run. For example, if you have a DAG that already runs on a schedule, and you trigger this DAG manually, then Airflow executes your DAG once, independently from the actual schedule specified for the DAG.


2 Answers

https://airflow.apache.org/docs/apache-airflow/stable/concepts/dags.html#concepts-trigger-rules

all_done means all operations have finished working. Maybe they succeeded, maybe not.

all_success means all operations have finished without error

So your guess is correct

like image 83
Sheena Avatar answered Oct 16 '22 19:10

Sheena


SUMMARY
The tasks are "all done" if the count of SUCCESS, FAILED, UPSTREAM_FAILED, SKIPPED tasks is greater than or equal to the count of all upstream tasks.

Not sure why it would be greater than? Perhaps subdags do something weird to the counts.

Tasks are "all success" if the count of upstream tasks and the count of success upstream tasks is the same.

DETAILS
The code for evaluating trigger rules is here https://github.com/apache/incubator-airflow/blob/master/airflow/ti_deps/deps/trigger_rule_dep.py#L72

  1. ALL_DONE

The following code runs the qry and returns the first row (the query is an aggregation that will only ever return one row anyway) into the following variables:

successes, skipped, failed, upstream_failed, done = qry.first() 

the "done" column in the query corresponds to this: func.count(TI.task_id) in other words a count of all the tasks matching the filter. The filter specifies that it is counting only upstream tasks, from the current dag, from the current execution date and this:

 TI.state.in_([                     State.SUCCESS, State.FAILED,                     State.UPSTREAM_FAILED, State.SKIPPED]) 

So done is a count of the upstream tasks with one of those 4 states.

Later there is this code

upstream = len(task.upstream_task_ids) ... upstream_done = done >= upstream 

And the actual trigger rule only fails on this

if not upstream_done 
  1. ALL_SUCCESS

The code is fairly straightforward and the concept is intuitive

num_failures = upstream - successes if num_failures > 0: ... it fails 
like image 22
Davos Avatar answered Oct 16 '22 18:10

Davos