What is significance of the Oozie MR launcher?

Tags:

I created a simple Oozie work flow with Sqoop, Hive and Pig actions. For each of there actions, Oozie launches a MR launcher and which in turn launches the action (Sqoop/Hive/Pig). So, there are a total of 6 MR jobs for 3 actions in the work flow.

Why does Oozie start an MR launcher to start the action and not directly start the action?

684

asked Oct 21 '13 07:10

Praveen Sripati

1 Answers

I posted the same in Apache Flume forums and here is the response.

It's also to keep the Oozie server from being bogged down or becoming unstable. For example, if you have a bunch of workflows running Pig jobs, then you'd have the Oozie server running multiple copies of the Pig client (which is a relatively "heavy" program) directly. By moving all of the user code and external clients to map tasks in the launcher job, the Oozie server remains more light-weight and less prone to errors. It can also much more scalable this way because the launcher jobs distribute the the job launching/monitoring to other machines in the cluster; otherwise, with the Oozie server doing everything, we'd have to limit the number of concurrent workflows based on your Oozie server's machine specs (RAM, CPU, etc). And finally, from an architectural standpoint, the Oozie server itself is stateless; that is, everything is stored in the database and the Oozie server can be taken down at any point without losing anything. If we were to launch jobs directly from the Oozie server, then we'd now have some state (e.g. the Pig client cannot be restarted and resumed).

165

answered Oct 16 '22 07:10

Praveen Sripati

Related questions
                            
                                Chaining Multi-Reducers in a Hadoop MapReduce job
                            
                                R+Hadoop: How to read CSV file from HDFS and execute mapreduce?
                            
                                Processing images using hadoop
                            
                                hadoop/yarn and task parallelization on non-hdfs filesystems
                            
                                Error on running multiple Workflow in OOZIE-4.1.0
                            
                                JAVA_HOME error with upgrade to Spark 1.3.0
                            
                                How wordCount mapReduce jobs, run on hadoop yarn cluster with apache tez?
                            
                                Is it possible to read and write Parquet using Java without a dependency on Hadoop and HDFS?
                            
                                Loading data from RDBMS to Hadoop with multiple destinations
                            
                                Read data from remote hive on spark over JDBC returns empty result
                            
                                How to speedup my tensorflow execution on hadoop?
                            
                                Re-run Spark jobs on Failure or Abort
                            
                                Flink - No FileSystem for scheme: hdfs
                            
                                Spark and Hive in Hadoop 3: Difference between metastore.catalog.default and spark.sql.catalogImplementation
                            
                                When was the first version of Hadoop released? [closed]
                            
                                How does one implement a Hadoop Mapper in Scala 2.9.0?
                            
                                hbase.MasterNotRunningException while creating table in Hbase
                            
                                Pass directories not files to hadoop-streaming?
                            
                                Exit pig shell command safely
                            
                                What is the difference between job.submit and job.waitForComplete in Apache Hadoop?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What is significance of the Oozie MR launcher?

Tags:

hadoop

mapreduce

oozie

Praveen Sripati

People also ask

1 Answers

Praveen Sripati

Recent Activity

Donate For Us