I am new to job schedulers and was looking out for one to run jobs on big data cluster. I was quite confused with the available choices. Found Oozie to have many limitations as compared to the already existing ones such as TWS, Autosys, etc.
Need some comparison points on Oozie vs. Airflow.
Appreciate your help.
The Airflow UI is much better than Hue (Oozie UI),for example: Airflow UI has a Tree view to track task failures unlike Hue, which tracks only job failure. The Airflow UI also lets you view your workflow code, which the Hue UI does not.
Oozie additionally supports subworkflow and allows workflow node properties to be parameterized and dynamically evaluated using EL function. In contrast, Airflow is a generic workflow orchestration for programmatically authoring, scheduling, and monitoring workflows.
What is Airflow Used For? Apache Airflow is used for the scheduling and orchestration of data pipelines or workflows. Orchestration of data pipelines refers to the sequencing, coordination, scheduling, and managing complex data pipelines from diverse sources.
In my experience Airflow is the best data pipeline right now. It's best suited for managing complex, long running workflows. UI and modularity are over the top.
Airflow
Oozie
As you see, Airflow is an easier to use (especially in large heteregenoeus team), more versatile and powerful option than Oozie.
As I said: go with Airflow.
Article you may find interesting
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With