I know that priority_weight
can be set for the DAG
in the default_args
as per the example in the official documentation here.
Can we also set priority_weight
that is different for each task in the DAG
?
Following the example from the tutorial, it would mean that t1
would have a different priority from t2
.
Basic dependencies between Airflow tasks can be set in the following ways: Using bitshift operators ( << and >> ) Using the set_upstream and set_downstream methods.
By default, Airflow uses SequentialExecutor which would execute task sequentially no matter what. So to allow Airflow to run tasks in Parallel you will need to create a database in Postges or MySQL and configure it in airflow. cfg ( sql_alchemy_conn param) and then change your executor to LocalExecutor in airflow.
You can also tune your worker_concurrency (environment variable: AIRFLOW__CELERY__WORKER_CONCURRENCY ), which determines how many tasks each Celery worker can run at any given time. By default, the Celery executor runs a maximum of sixteen tasks concurrently.
Apache Airflow's capability to run parallel tasks, ensured by using Kubernetes and CeleryExecutor, allows you to save a lot of time. You can use it to execute even 1000 parallel tasks in only 5 minutes.
Can we also set priority_weight that is different for each task in the DAG?
Short Answer
Yes
Long Version
You appear a little confused here. Citing the passage above the snippet in the given link:
..we have the choice to explicitly pass a set of arguments to each task’s constructor (which would become redundant), or (better!) we can define a dictionary of default parameters that we can use when creating tasks..
So now you must have inferred that The priority_weight
that was being passed in default_args
was actually meant for individual task
s and not the DAG
itself. Of course looking at the code it becomes clear that it's a parameter of BaseOperator
and not DAG
SQLAlchemy
model
Also once you get to know the above fact, you'll soon realize that it wouldn't make much sense to assign same priority to each task
of DAG
. The said example from the official docs indeed appears to have overlooked this simple reasoning (unless I'm missing something). Nevertheless the docstring does seem to indicate so
:param priority_weight: priority weight of this task against other task. This allows the executor to trigger higher priority tasks before others when things get backed up.
UPDATE-1
As rightly pointed out by @Alessandro S. in comments, assigning same priority_weight
to all tasks within a DAG is NOT unreasonable after all since priority_weight
is not enforced on DAG-level but on pool
level
priority_weight
which is higher than that of tasks in second dag.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With