Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Which one to choose Apache Oozie or Apache Airflow? Need a comparison

Tags:

I am new to job schedulers and was looking out for one to run jobs on big data cluster. I was quite confused with the available choices. Found Oozie to have many limitations as compared to the already existing ones such as TWS, Autosys, etc.

Need some comparison points on Oozie vs. Airflow.

Appreciate your help.

like image 949
Vishal786btc Avatar asked Dec 21 '17 16:12

Vishal786btc


People also ask

Which is better Oozie or Airflow?

The Airflow UI is much better than Hue (Oozie UI),for example: Airflow UI has a Tree view to track task failures unlike Hue, which tracks only job failure. The Airflow UI also lets you view your workflow code, which the Hue UI does not.

Is Oozie same as Airflow?

Oozie additionally supports subworkflow and allows workflow node properties to be parameterized and dynamically evaluated using EL function. In contrast, Airflow is a generic workflow orchestration for programmatically authoring, scheduling, and monitoring workflows.

For which use Apache Airflow is best suited?

What is Airflow Used For? Apache Airflow is used for the scheduling and orchestration of data pipelines or workflows. Orchestration of data pipelines refers to the sequencing, coordination, scheduling, and managing complex data pipelines from diverse sources.


1 Answers

In my experience Airflow is the best data pipeline right now. It's best suited for managing complex, long running workflows. UI and modularity are over the top.

Airflow

  • + Python Code for DAGs
  • + Has connectors for every major service/cloud provider
  • + More versatile
  • + Advanced metrics
  • + Better UI and API
  • + Capable of creating extremely complex workflows
  • + Jinja Templating
  • + Can be used as an Orchestrator for the Tensorflow Extended ecosystem
  • = Can be parallelized
  • = Native Connections to HDFS, HIVE, PIG etc..
  • = Graph as DAG

Oozie

  • --- Java or XML for DAGs
  • - hard to build complex pipelines
  • - smaller, less active community
  • - worse WEB GUI
  • - Java API
  • = Can be parallelized
  • = Native Connections to HDFS, HIVE, PIG etc..
  • = Graph as DAG

As you see, Airflow is an easier to use (especially in large heteregenoeus team), more versatile and powerful option than Oozie.

As I said: go with Airflow.

Article you may find interesting

like image 76
Michele 'Ubik' De Simoni Avatar answered Oct 10 '22 12:10

Michele 'Ubik' De Simoni