I am looking for a workflow tool to run complex map-reduce jobs. I have Oozie in mind but also want to explore Cascading. Is there any sample code or example that chains existing M/R jobs using cascading API? Also, can you provide the comparison Oozie Vs Cascading?
The Airflow UI is much better than Hue (Oozie UI),for example: Airflow UI has a Tree view to track task failures unlike Hue, which tracks only job failure. The Airflow UI also lets you view your workflow code, which the Hue UI does not.
Oozie additionally supports subworkflow and allows workflow node properties to be parameterized and dynamically evaluated using EL function. In contrast, Airflow is a generic workflow orchestration for programmatically authoring, scheduling, and monitoring workflows.
Apache Oozie is a tool for Hadoop operations that allows cluster administrators to build complex data transformations out of multiple component tasks. This provides greater control over jobs and also makes it easier to repeat those jobs at predetermined intervals.
Cascading and Oozie are not in the same category.
Oozie is a workflow scheduler.
Cascading is an API for creating workflows. It is agnostic about schedulers, i.e., it should run with whatever scheduler system that you use.
There is perhaps some confusion because the Oozie docs mention a "DAG", and both run atop Hadoop.
Also, Cascading has a notion of "data availability" in the checkpoint support, which is supported in Oozie, albeit differently.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With