Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How oozie handle dependencies?

I have several questions about oozie 2.3 share libraries:

Currently, I defined the share libraries in our coordinator.properties:

oozie.use.system.libpath=true 
oozie.libpath=<hdfs_path>

Here are my questions:

  1. When share libraries are copied to other data node and how many data node will get share libraries?

  2. Are the share libraries copied to other data node based on number of wf in a coordinator job or they are only copied once per coordinator job?

like image 792
Terminal User Avatar asked Jun 14 '12 22:06

Terminal User


People also ask

How does an Oozie coordinator work?

When a coordinator job starts, Oozie puts the job in status RUNNING and starts materializing workflow jobs based on the job frequency. When a user requests to kill a coordinator job, Oozie puts the job in status KILLED and it sends kill to all submitted workflow jobs.

What is the usage of Oozie and what are its main components?

Apache Oozie is a Java Web application used to schedule Apache Hadoop jobs. Oozie combines multiple jobs sequentially into one logical unit of work. It is integrated with the Hadoop stack, with YARN as its architectural center, and supports Hadoop jobs for Apache MapReduce, Apache Pig, Apache Hive, and Apache Sqoop.

What is Oozie used for?

Oozie is a workflow scheduler system to manage Apache Hadoop jobs. It is integrated with the rest of the Hadoop stack supporting several types of Hadoop jobs such as Java MapReduce, Streaming MapReduce, Pig, Hive and Sqoop. Oozie is a scalable, reliable and extensible system.


1 Answers

Adding entries to the oozie.libpath property effectively means that OOZIE will configure those libraries to be in the mapred.cache.files configuration property (this is a DistributedCache property) when the actions in your workflow are executed.

Hadoop will then take care of copying those jars to each cluster node once per job, and the tasks are then configured with the jar in the classpath configuration property mapred.job.classpath.files

So in response to your second question, they will be copied over for each action in the workflow, not once per coordinator job. So if you have a wf job that has 4 mapreduce actions, the libraries will be copied to each tasktracker (only those task trackers that participate in the mapreduce job) 4 times in the lifetime of that workflow.

like image 134
Chris White Avatar answered Oct 16 '22 05:10

Chris White