Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Airflow depends_on_past explanation

Tags:

According to the official Airflow docs, The task instances directly upstream from the task need to be in a success state. Also, if you have set depends_on_past=True, the previous task instance needs to have succeeded (except if it is the first run for that task).

As all know, the task is kind of 'instantiated & parameteriazed' operator.

Now this is what confuse me. For example:

DAG: {op_1} -> {op_2} -> {op_3} 

{op_2} is a simple PythonOperator that takes 1 parameter from {op_1} and do stuff;

To my understanding, op_2(param_1) & op_2(param_2) are considered as 2 different tasks.

Given depends_on_past is set to True, then:

  1. If op_2(param_1) is still running; can op_2(param_2) be run?
  2. If op_2(param_1) fails in the previous run; can op_2(param_1) be run in the current run?
like image 862
WeiHao Avatar asked Feb 07 '18 07:02

WeiHao


People also ask

What is Depends_on_past in Airflow?

According to the official Airflow docs, The task instances directly upstream from the task need to be in a success state. Also, if you have set depends_on_past=True, the previous task instance needs to have succeeded (except if it is the first run for that task).

What is Airflow and how it works?

Airflow is a platform that lets you build and run workflows. A workflow is represented as a DAG (a Directed Acyclic Graph), and contains individual pieces of work called Tasks, arranged with dependencies and data flows taken into account.

What does BashOperator mean in Airflow DAG?

The Airflow BashOperator does exactly what you are looking for. It is a very simple but powerful operator, allowing you to execute either a bash script, a command or a set of commands from your DAGs.

What is Max active runs in Airflow?

There are three primary DAG-level Airflow settings that users can define in code: max_active_runs : This is the maximum number of active DAG runs allowed for the DAG in question. Once this limit is hit, the Scheduler will not create new active DAG runs.


Video Answer


1 Answers

From the official docs for trigger rules:

depends_on_past (boolean) when set to True, keeps a task from getting triggered if the previous schedule for the task hasn’t succeeded.

So unless a previous run of your DAG has failed, the depends_on_past should not be a factor, it will not affect the current run at all if the previous run executed the tasks successfully.

like image 69
Meghdeep Ray Avatar answered Sep 21 '22 14:09

Meghdeep Ray