Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Airflow task to refer to multiple previous tasks?

Is there a way I can have a task require the completion of multiple upstream tasks which are still able to finish independently?

  • download_fcr --> process_fcr --> load_fcr
  • download_survey --> process_survey --> load_survey

create_dashboard should require load_fcr and load_survey to successfully complete.

I do not want to force anything in the 'survey' task chain to require anything from the 'fcr' task chain to complete. I want them to process in parallel and still complete even if one fails. However, the dashboard task requires both to finish loading to the database before it should start.

fcr *-->*-->*
             \
               ---> create_dashboard
                /
survey *-->*-->*
like image 349
trench Avatar asked Mar 27 '17 17:03

trench


2 Answers

You can pass a list of tasks to set_upstream or set_downstream. In your case, if you specifically want to use set_upstream, you could describe your dependencies as:

create_dashboard.set_upstream([load_fcr, load_survey])

load_fcr.set_upstream(process_fcr)
process_fcr.set_upstream(download_fcr)

load_survey.set_upstream(process_survey)
process_survey.set_upstream(download_survey)

Have a look at airflow's source code: even when you pass just one task object to set_upstream, it actually wraps a list around it before doing anything.

like image 87
flaviomax Avatar answered Oct 15 '22 12:10

flaviomax


download_fcr.set_downstream(process_fcr)
process_fcr.set_downstream(load_fcr)

download_survey.set_downstream(process_survey)
process_survey.set_downstream(load_survey)

load_survey.set_downstream(create_dashboard)
load_fcr.set_downstream(create_dashboard)
like image 39
jhnclvr Avatar answered Oct 15 '22 13:10

jhnclvr