Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Integration of Kubernetes with Apache Airflow

We are building workflow scheduling application. We found Airflow as a good option for workflow manager and Kubernetes as good option for Cluster manager. Thus, flow would be,

  1. We will submit workflow DAG to Airflow.
  2. Airflow should submit the tasks of a given DAG to Kubernetes by specifying docker image.
  3. Kubernetes should execute the task by running docker container on an available EC2 worker node of a cluster.

On searching, we found, Airflow has Operators for integrating with ECS, Mesos but not for Kubernetes. However, we found a request for Kubernetes Operator on Airflow wiki, but not any further update on it.

So, the question to be simply put is, how to integrate Airflow with Kubernetes?

like image 810
Free Coder Avatar asked Jan 24 '18 03:01

Free Coder


People also ask

Does Airflow work with Kubernetes?

Apache Airflow aims to be a very Kubernetes-friendly project, and many users run Airflow from within a Kubernetes cluster in order to take advantage of the increased stability and autoscaling options that Kubernetes provides.

Why does Kubernetes have Airflow?

Benefits of Airflow Kubernetes OperatorCustom Docker images let users ensure that the tasks configuration, environment, and dependencies are completely idempotent. Increased Flexibility for Deployments: Airflow's plugin API has always provided a significant boon to engineers wishing to test new features in their DAGs.

Does Kubernetes use Apache?

Apache Kafka is frequently deployed on the Kubernetes container management system, which is used to automate deployment, scaling, and operation of containers across clusters of hosts.

Is Apache Airflow Kubernetes-friendly?

Apache Airflow aims to be a very Kubernetes-friendly project, and many users run Airflow from within a Kubernetes cluster in order to take advantage of the increased stability and autoscaling options that Kubernetes provides. We maintain official Helm chart for Airflow that helps you define, install, and upgrade deployment.

What tools do I need to run airflow on Kubernetes?

As you know, Helm allows you deploy applications on Kubernetes whereas Kubectl allows you to run commands against your Kubernetes Cluster. Without Kubectl, you won’t be able to get the logs of your PODs, debug your errors or check your nodes. That’s it. To run Airflow on Kubernetes you need 5 tools: Docker, Docker Compose, KinD, Helm and Kubectl.

How do Kubernetes pods work in airflow?

This means that the Airflow workers will never have access to this information, and can simply request that pods be built with only the secrets they need. The Kubernetes Operator uses the Kubernetes Python Client to generate a request that is processed by the APIServer (1).

What is Bloomberg's New Kubernetes airflow operator?

As part of Bloomberg's continued commitment to developing the Kubernetes ecosystem, we are excited to announce the Kubernetes Airflow Operator; a mechanism for Apache Airflow, a popular workflow orchestration framework to natively launch arbitrary Kubernetes Pods using the Kubernetes API. What Is Airflow?


1 Answers

This is in flight right now. You just can follow along with this major jira ticket

One of the more stable branches (work is being led by a lot of this team) is located in the bloomberg fork on github in the airflow-kubernetes-executor branch though it is in the process of being rebased off of a constantly moving airflow master.

I have a branch on my fork that addresses many of the short term issues and runs well enough called frankensteins-monster. Use this at your own risk though it works for me right now. I am building a docker image using the build.sh script located in scripts/ci/kubernetes/docker.

Good luck!

like image 158
gurooj Avatar answered Sep 26 '22 01:09

gurooj