We are building workflow scheduling application. We found Airflow as a good option for workflow manager and Kubernetes as good option for Cluster manager. Thus, flow would be,
On searching, we found, Airflow has Operators for integrating with ECS, Mesos but not for Kubernetes. However, we found a request for Kubernetes Operator on Airflow wiki, but not any further update on it.
So, the question to be simply put is, how to integrate Airflow with Kubernetes?
Apache Airflow aims to be a very Kubernetes-friendly project, and many users run Airflow from within a Kubernetes cluster in order to take advantage of the increased stability and autoscaling options that Kubernetes provides.
Benefits of Airflow Kubernetes OperatorCustom Docker images let users ensure that the tasks configuration, environment, and dependencies are completely idempotent. Increased Flexibility for Deployments: Airflow's plugin API has always provided a significant boon to engineers wishing to test new features in their DAGs.
Apache Kafka is frequently deployed on the Kubernetes container management system, which is used to automate deployment, scaling, and operation of containers across clusters of hosts.
Apache Airflow aims to be a very Kubernetes-friendly project, and many users run Airflow from within a Kubernetes cluster in order to take advantage of the increased stability and autoscaling options that Kubernetes provides. We maintain official Helm chart for Airflow that helps you define, install, and upgrade deployment.
As you know, Helm allows you deploy applications on Kubernetes whereas Kubectl allows you to run commands against your Kubernetes Cluster. Without Kubectl, you won’t be able to get the logs of your PODs, debug your errors or check your nodes. That’s it. To run Airflow on Kubernetes you need 5 tools: Docker, Docker Compose, KinD, Helm and Kubectl.
This means that the Airflow workers will never have access to this information, and can simply request that pods be built with only the secrets they need. The Kubernetes Operator uses the Kubernetes Python Client to generate a request that is processed by the APIServer (1).
As part of Bloomberg's continued commitment to developing the Kubernetes ecosystem, we are excited to announce the Kubernetes Airflow Operator; a mechanism for Apache Airflow, a popular workflow orchestration framework to natively launch arbitrary Kubernetes Pods using the Kubernetes API. What Is Airflow?
This is in flight right now. You just can follow along with this major jira ticket
One of the more stable branches (work is being led by a lot of this team) is located in the bloomberg fork on github in the airflow-kubernetes-executor branch though it is in the process of being rebased off of a constantly moving airflow master.
I have a branch on my fork that addresses many of the short term issues and runs well enough called frankensteins-monster. Use this at your own risk though it works for me right now. I am building a docker image using the build.sh
script located in scripts/ci/kubernetes/docker
.
Good luck!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With