Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can we set priority_weight per task in Airflow?

Tags:

airflow

I know that priority_weight can be set for the DAG in the default_args as per the example in the official documentation here.

Can we also set priority_weight that is different for each task in the DAG?

Following the example from the tutorial, it would mean that t1 would have a different priority from t2.

like image 424
Newskooler Avatar asked Mar 13 '19 15:03

Newskooler


People also ask

How do I set dependencies between tasks in Airflow?

Basic dependencies between Airflow tasks can be set in the following ways: Using bitshift operators ( << and >> ) Using the set_upstream and set_downstream methods.

How do you run a task parallelly in Airflow?

By default, Airflow uses SequentialExecutor which would execute task sequentially no matter what. So to allow Airflow to run tasks in Parallel you will need to create a database in Postges or MySQL and configure it in airflow. cfg ( sql_alchemy_conn param) and then change your executor to LocalExecutor in airflow.

How many tasks can Airflow handle?

You can also tune your worker_concurrency (environment variable: AIRFLOW__CELERY__WORKER_CONCURRENCY ), which determines how many tasks each Celery worker can run at any given time. By default, the Celery executor runs a maximum of sixteen tasks concurrently.

How many tasks can run in parallel Airflow?

Apache Airflow's capability to run parallel tasks, ensured by using Kubernetes and CeleryExecutor, allows you to save a lot of time. You can use it to execute even 1000 parallel tasks in only 5 minutes.


1 Answers

Can we also set priority_weight that is different for each task in the DAG?

Short Answer

Yes


Long Version

You appear a little confused here. Citing the passage above the snippet in the given link:

..we have the choice to explicitly pass a set of arguments to each task’s constructor (which would become redundant), or (better!) we can define a dictionary of default parameters that we can use when creating tasks..

So now you must have inferred that The priority_weight that was being passed in default_args was actually meant for individual tasks and not the DAG itself. Of course looking at the code it becomes clear that it's a parameter of BaseOperator and not DAG SQLAlchemy model


Also once you get to know the above fact, you'll soon realize that it wouldn't make much sense to assign same priority to each task of DAG. The said example from the official docs indeed appears to have overlooked this simple reasoning (unless I'm missing something). Nevertheless the docstring does seem to indicate so

:param priority_weight: priority weight of this task against other task.
        This allows the executor to trigger higher priority tasks before
        others when things get backed up.

UPDATE-1

As rightly pointed out by @Alessandro S. in comments, assigning same priority_weight to all tasks within a DAG is NOT unreasonable after all since priority_weight is not enforced on DAG-level but on pool level

  • So when you take 2 (or more) dags into picture (both accessing same external resource) then a valid use-case could be that you want to promote all tasks of one dag over other one
  • To realize this, all tasks of first dag can be a single value of priority_weight which is higher than that of tasks in second dag.
like image 56
y2k-shubham Avatar answered Oct 07 '22 06:10

y2k-shubham