We're using Airflow:1.10.0 and after some analysis why some of our ETL processes are taking so long we saw that the subdags are using a <code>SequentialExecutor</code> instead to use <code>BaseExecutor</code> or when we configure the <code>CeleryExecutor</code>. I would like to know if this is a bug or an expected behavior of Airflow. Doesn't make any sense have some capability to execute tasks in parallel but in some specific kind of task, this capability is lost. <img src="https://i.stack.imgur.com/ll8he.png" alt="Execution of our SugDag (Zoom in Subdag)">

Maybe a little bit late but implementing LocalExecutor works for me. <pre class="prettyprint lang-py prettyprint-override"><code>from airflow.executors.local_executor import LocalExecutor subdag = SubDagOperator( task_id=task_id, default_args=default_args, executor= LocalExecutor(), dag=dag ) </code></pre>

CeleryExecutor in Airflow are not parallelizing tasks in a subdag

Tags:

airflow

airflow-scheduler

We're using Airflow:1.10.0 and after some analysis why some of our ETL processes are taking so long we saw that the subdags are using a SequentialExecutor instead to use BaseExecutor or when we configure the CeleryExecutor.

I would like to know if this is a bug or an expected behavior of Airflow. Doesn't make any sense have some capability to execute tasks in parallel but in some specific kind of task, this capability is lost.

Execution of our SugDag (Zoom in Subdag)

950

asked Aug 16 '18 21:08

Flavio

2 Answers

It is a typical pattern to use the SequentialExecutor in subdags with the idea that you often are executing a lot of similar related tasks and don't necessarily want the added overhead of going through adding to queues in celery, etc. See the "other tips" section in the Airflow docs for subdags: https://airflow.apache.org/concepts.html#subdags

By default subdags are set to use the Sequential Executor (see: https://github.com/apache/incubator-airflow/blob/v1-10-stable/airflow/operators/subdag_operator.py#L38) but you can change that.

To use the celery executor, add in the following in your subdag creation:

from airflow.executors.celery_executor import CeleryExecutor
mysubdag = SubDagOperator(
    executor=CeleryExecutor()
    ...
)

answered Sep 29 '22 21:09

Charlie Gelman

Maybe a little bit late but implementing LocalExecutor works for me.

from airflow.executors.local_executor import LocalExecutor

subdag = SubDagOperator(
  task_id=task_id,
  default_args=default_args,
  executor= LocalExecutor(),
  dag=dag
)

answered Sep 29 '22 23:09

strider

Related questions
                            
                                Using Airflow template files and template_searchpath in Google Cloud Composer
                            
                                How to get airflow to add thousands of tasks to celery at one time?
                            
                                Airflow - Locking between tasks so that only one parallel task runs at a time?
                            
                                Airflow - ModuleNotFoundError: No module named 'kubernetes'
                            
                                How to enable SSL on Airflow Webserver?
                            
                                Airflow can not enter the /admin page after initdb
                            
                                Problem with start date and scheduled date in Apache airflow
                            
                                Pass a list of strings as parameter of a dependant task in Airflow
                            
                                Passing arguments to sql template from airflow operator
                            
                                Airflow DAG in functions?
                            
                                Generating uuid and use it across Airflow DAG
                            
                                Why is there a DAG named 'airflow_monitoring' automatically generated in Cloud Composer?
                            
                                Airflow: Unable to access the AWS providers
                            
                                Airflow Macros In Python Operator
                            
                                "Error: /run/airflow doesn't exist. Can't create pidfile." when using systemd for Airflow webserver
                            
                                Airflow sla_miss_callback function not triggering
                            
                                what is the context argument in BaseOperator.xcom_pull
                            
                                Email on failure using AWS SES in Apache Airflow DAG
                            
                                Airflow: Set dag to not be automatically scheduled
                            
                                Airflow task running tweepy exits with return code -6

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With