Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What does the landing time mean in airflow?

Tags:

python

airflow

There is a section called "landing time" in the DAG view on the web console of airflow.

An example screen shot taken from airbnb's blog:

But what does it mean? There is no definition in the documents or in their repository.

like image 823
user6442810 Avatar asked Jun 08 '16 22:06

user6442810


People also ask

How does Airflow scheduling work?

The Airflow scheduler monitors all tasks and all DAGs, and triggers the task instances whose dependencies have been met. Behind the scenes, it monitors and stays in sync with a folder for all DAG objects it may contain, and periodically (every minute or so) inspects active tasks to see whether they can be triggered.

How often does Airflow check for new DAGs?

Airflow scans the dags_folder for new DAGs every dag_dir_list_interval , which defaults to 5 minutes but can be modified. You might have to wait until this interval has passed before a new DAG appears in the UI.

What is Start_date in Airflow DAG?

Every DAG has its schedule, start_date is simply the date a DAG should be included in the eyes of the Airflow scheduler. It also helps the developers to release a DAG before its production date. You could set up start_date more dynamically before Airflow 1.8.

What is depends on past in Airflow?

According to the official Airflow docs, The task instances directly upstream from the task need to be in a success state. Also, if you have set depends_on_past=True, the previous task instance needs to have succeeded (except if it is the first run for that task).


3 Answers

Since the existing answer here wasn't totally clear, and this is the top hit for "airflow landing time" I went to the chat archives and found the original answer being referenced here:

Maxime Beauchemin @mistercrunch Jun 09 2016 11:12 
it's the number of hours after the time the scheduling period ended
take a schedule_interval='@daily' run for 2016-01-01 that finishes at 2016-01-02 03:52:00
landing time is 3:52

https://gitter.im/apache/incubator-airflow/archives/2016/06/09

It seems the Y axis is in hours, and the negative landing times are a result of running jobs manually so they finish hours before they "should have finished" based on the schedule.

like image 87
Steve Wetzel Avatar answered Oct 02 '22 03:10

Steve Wetzel


I directly asked the author Maxime. His answer was landing_time is when the job completes minus when the job should have started (for airflow, it's the end of the scheduled period).

source: http://gitter.im/apache/incubator-airflow It is a good place to get help and Maxine is very nice and helpful. But the answers are not persistent..

like image 35
user6442810 Avatar answered Oct 02 '22 03:10

user6442810


For me its easier to understand landing_time using an example. So let's say we have a dag scheduled to run daily at 0 0 * * *. This dag has 2 tasks that execute sequentially:

first_task >> second_task

The first_task starts at 00:00 and 10 seconds and finishes after 5 minutes. The landing_time for first_task will be 10 seconds.

The second_task starts execution at 00:07 minute and finishes after 2 minutes. The landing_time for the second_task would be 7 minutes.

So we just delete from the task start time the dag execution_date. I usually use landing_time as a measure - metric of the performance of the whole airflow system. For example increase in loading_times in the first tasks seems to mean that scheduler is under heavy load or we should adapt task parallelization (through airflow.cfg).

like image 30
alexopoulos7 Avatar answered Oct 02 '22 01:10

alexopoulos7