Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficient ways of implementing waiting till a certain criterion is met in Airflow

Sensors in Airflow - are a certain type of operator that will keep running until a certain criterion is met but they consume a full worker slot. Curious if people have been able to reliably use more efficient ways of implementing this.

A few ideas on my mind

  • using pools to restrict the number of worker slots allotted to sensors
  • skipping all tasks downstream and then clear and resume via an external trigger
  • pause the run of the DAG and resume again via an external trigger

Other relevant links:

  • How to implement polling in Airflow?

  • How to wait for an asynchronous event in a task of a DAG in a workflow implemented using Airflow?

  • Airflow unpause dag programmatically?
like image 703
sharky Avatar asked Feb 09 '18 08:02

sharky


1 Answers

The new version of Airflow,namely 1.10.2 provides new option for sensors, which I think addresses your concerns:

mode (str) – How the sensor operates. Options are: { poke | reschedule }, default is poke. When set to poke the sensor is taking up a worker slot for its whole execution time and sleeps between pokes. Use this mode if the expected runtime of the sensor is short or if a short poke interval is requried. When set to reschedule the sensor task frees the worker slot when the criteria is not yet met and it’s rescheduled at a later time. Use this mode if the expected time until the criteria is met is. The poke inteval should be more than one minute to prevent too much load on the scheduler.

Here is the link to doc.

like image 91
gorros Avatar answered Oct 07 '22 23:10

gorros