What is the purpose of "uber mode" in hadoop?

Question

Hi I am a big data newbie. I searched all over the internet to find what exactly uber mode is. The more I searched the more I got confused. Can anybody please help me by answering my questions?

What does uber mode do?
Does it works differently in mapred 1.x and 2.x?
And where can I find the setting for it?

ableHercules · Accepted Answer

What is UBER mode in Hadoop2?

Normally mappers and reducers will run by ResourceManager (RM), RM will create separate container for mapper and reducer. Uber configuration, will allow to run mapper and reducers in the same process as the ApplicationMaster (AM).

Uber jobs :

Uber jobs are jobs that are executed within the MapReduce ApplicationMaster. Rather then communicate with RM to create the mapper and reducer containers. The AM runs the map and reduce tasks within its own process and avoided the overhead of launching and communicate with remote containers.

Why

If you have a small dataset or you want to run MapReduce on small amount of data, Uber configuration will help you out, by reducing additional time that MapReduce normally spends in mapper and reducers phase.

Can I configure an Uber for all MapReduce job?

As of now, map-only jobs and jobs with one reducer are supported.

Navneet Kumar · Answer

Uber Job occurs when multiple mapper and reducers are combined to use a single container. There are four core settings around the configuration of Uber Jobs in the mapred-site.xml. Configuration options for Uber Jobs:

mapreduce.job.ubertask.enable
mapreduce.job.ubertask.maxmaps
mapreduce.job.ubertask.maxreduces
mapreduce.job.ubertask.maxbytes

You can find more details here: http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1.15/bk_using-apache-hadoop/content/uber_jobs.html

Shubham Chaurasia · Answer

In terms of hadoop2.x, Uber jobs are the jobs which are launched in mapreduce ApplicationMaster itself i.e. no separate containers are created for map and reduce jobs and hence the overhead of creating containers and communicating with them is saved.

As far as working (with hadoop 1.x and 2.x) is concerned, I suppose the difference is only observable when it comes to terminologies of 1.x and 2.x, no difference in working.

Configuration params are same as those mentioned by Navneet Kumar in his answer.
PS: Use it only with small dataset.

Azim · Answer

Pretty good answers are given for "What is Uber Mode?" Just to add some more information for "Why?"

The application master decides how to run the tasks that make up the MapReduce job. If the job is small, the application master may choose to run the tasks in the same JVM as itself. This happens when it judges the overhead of allocating and running tasks in new containers outweighs the gain in running them in parallel, when compared to running them sequentially on one node.

Now, the questions could be raised as "What qualifies as a small job?

By default, a small job is one that has less than 10 mappers, only one reducer, and an input size that is less than the size of one HDFS block.

What is the purpose of "uber mode" in hadoop?

Tags:

hadoop

mapreduce

Mohammed Asad

4 Answers

ableHercules

Navneet Kumar

Shubham Chaurasia

Azim

Recent Activity

Donate For Us

What is the purpose of "uber mode" in hadoop?

Tags:

hadoop

mapreduce

Mohammed Asad

4 Answers

ableHercules

Navneet Kumar

Shubham Chaurasia

Azim

Related questions

Recent Activity

Donate For Us