Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the purpose of "uber mode" in hadoop?

Hi I am a big data newbie. I searched all over the internet to find what exactly uber mode is. The more I searched the more I got confused. Can anybody please help me by answering my questions?

  • What does uber mode do?
  • Does it works differently in mapred 1.x and 2.x?
  • And where can I find the setting for it?
like image 852
Mohammed Asad Avatar asked May 17 '15 06:05

Mohammed Asad


4 Answers

What is UBER mode in Hadoop2?

Normally mappers and reducers will run by ResourceManager (RM), RM will create separate container for mapper and reducer. Uber configuration, will allow to run mapper and reducers in the same process as the ApplicationMaster (AM).

Uber jobs :

Uber jobs are jobs that are executed within the MapReduce ApplicationMaster. Rather then communicate with RM to create the mapper and reducer containers. The AM runs the map and reduce tasks within its own process and avoided the overhead of launching and communicate with remote containers.

Why

If you have a small dataset or you want to run MapReduce on small amount of data, Uber configuration will help you out, by reducing additional time that MapReduce normally spends in mapper and reducers phase.

Can I configure an Uber for all MapReduce job?

As of now, map-only jobs and jobs with one reducer are supported.

like image 98
ableHercules Avatar answered Nov 09 '22 04:11

ableHercules


Uber Job occurs when multiple mapper and reducers are combined to use a single container. There are four core settings around the configuration of Uber Jobs in the mapred-site.xml. Configuration options for Uber Jobs:

  • mapreduce.job.ubertask.enable
  • mapreduce.job.ubertask.maxmaps
  • mapreduce.job.ubertask.maxreduces
  • mapreduce.job.ubertask.maxbytes

You can find more details here: http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1.15/bk_using-apache-hadoop/content/uber_jobs.html

like image 27
Navneet Kumar Avatar answered Nov 09 '22 03:11

Navneet Kumar


In terms of hadoop2.x, Uber jobs are the jobs which are launched in mapreduce ApplicationMaster itself i.e. no separate containers are created for map and reduce jobs and hence the overhead of creating containers and communicating with them is saved.

As far as working (with hadoop 1.x and 2.x) is concerned, I suppose the difference is only observable when it comes to terminologies of 1.x and 2.x, no difference in working.

Configuration params are same as those mentioned by Navneet Kumar in his answer.
PS: Use it only with small dataset.

like image 4
Shubham Chaurasia Avatar answered Nov 09 '22 03:11

Shubham Chaurasia


Pretty good answers are given for "What is Uber Mode?" Just to add some more information for "Why?"

The application master decides how to run the tasks that make up the MapReduce job. If the job is small, the application master may choose to run the tasks in the same JVM as itself. This happens when it judges the overhead of allocating and running tasks in new containers outweighs the gain in running them in parallel, when compared to running them sequentially on one node.

Now, the questions could be raised as "What qualifies as a small job?

By default, a small job is one that has less than 10 mappers, only one reducer, and an input size that is less than the size of one HDFS block.

like image 4
Azim Avatar answered Nov 09 '22 04:11

Azim