Kubernetes executor do not parallelize sub DAGs execution in Airflow

Tags:

We moved away from the Celery Executor in Airflow 1.10.0 because of some limitations of execution and right now we're using KubernetesExecutor.

Right now we're not able to parallelize all the tasks in some DAGs even when we change the subdag_operator in the code directly: https://github.com/apache/incubator-airflow/blob/v1-10-stable/airflow/operators/subdag_operator.py#L38

Our expectations it's that with these modifications and using Kubernetes Executors we can fan out the execution of all tasks at the same time but we have the same behavior of the SequentialExecutor.

This is the behavior that we have right now:

enter image description here

We would like to execute all of them at the same time using KubernetesExecutor.

796

asked Sep 03 '18 09:09

Flavio

1 Answers

Kubernetes Executor in Airflow will turn all the first level of tasks into a worker pod with Local Executor.

It means that you will get the Local Executor to execute your SubDagOperator.

In order to run the tasks under SubDagOperator after the spawning the worker pod, you will need to specify the configuration parallelism for the worker pod. So, in case you are using the YAML format for worker pod, you will need to edit it to something like this.

apiVersion: v1
kind: Pod
metadata:
  name: dummy-name
spec:
  containers:
    - args: []
      command: []
      env:
        ###################################
        # This is the part you need to add
        ###################################
        - name: AIRFLOW__CORE__PARALLELISM
          value: 10
        ###################################
        - name: AIRFLOW__CORE__EXECUTOR
          value: LocalExecutor
        # Hard Coded Airflow Envs
        - name: AIRFLOW__CORE__FERNET_KEY
          valueFrom:
            secretKeyRef:
              name: RELEASE-NAME-fernet-key
              key: fernet-key
        - name: AIRFLOW__CORE__SQL_ALCHEMY_CONN
          valueFrom:
            secretKeyRef:
              name: RELEASE-NAME-airflow-metadata
              key: connection
        - name: AIRFLOW_CONN_AIRFLOW_DB
          valueFrom:
            secretKeyRef:
              name: RELEASE-NAME-airflow-metadata
              key: connection
      envFrom: []
      image: dummy_image
      imagePullPolicy: IfNotPresent
      name: base
      ports: []
      volumeMounts:
        - mountPath: "/opt/airflow/logs"
          name: airflow-logs
        - mountPath: /opt/airflow/dags
          name: airflow-dags
          readOnly: false
        - mountPath: /opt/airflow/dags
          name: airflow-dags
          readOnly: true
          subPath: repo/tests/dags
  hostNetwork: false
  restartPolicy: Never
  securityContext:
    runAsUser: 50000
  nodeSelector:
    {}
  affinity:
    {}
  tolerations:
    []
  serviceAccountName: 'RELEASE-NAME-worker-serviceaccount'
  volumes:
    - name: dags
      persistentVolumeClaim:
        claimName: RELEASE-NAME-dags
    - emptyDir: {}
      name: airflow-logs
    - configMap:
        name: RELEASE-NAME-airflow-config
      name: airflow-config
    - configMap:
        name: RELEASE-NAME-airflow-config
      name: airflow-local-settings

Then, SubDagOperator will follow the parallelism specified to run the tasks in parallel.

160

answered Oct 16 '22 06:10

Ryan Siu

Related questions
                            
                                How to parse json format output of : kubectl get pods using jsonpath
                            
                                Unable to connect to the server: net/http: TLS handshake timeout
                            
                                Multi threading with Millicores in Kubernetes
                            
                                kubernetes cleanup of pods,service,deployment etc
                            
                                How to leverage kubectl patch deployment to update an environment variable?
                            
                                AWS Nginx Ingress creating Classic Load Balancer instead of Application Load Balancer
                            
                                How to define external ip for kubernetes ingress
                            
                                how to inspect the content of persistent volume by kubernetes on azure cloud service
                            
                                Spark driver pod getting killed with 'OOMKilled' status
                            
                                How to resolve Heartbeat took longer than "00:00:01" failure?
                            
                                How to mimic --device option in docker run in kubernetes
                            
                                Is there a way to find the RoleBinding/ClusterRoleBinding related to a serviceAccount?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Kubernetes executor do not parallelize sub DAGs execution in Airflow

Tags:

kubernetes

airflow

airflow-scheduler

Flavio

People also ask

1 Answers

Ryan Siu

Recent Activity

Donate For Us