Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to set the number of map tasks in hadoop 0.20?

I'm trying to set the number of map tasks to run in hadoop 0.20 environment.

I am using the old api.

Here are the options I've tried so far:

    conf.set("mapred.tasktracker.map.tasks.maximum", "5");
    conf.set("mapred.map.tasks", "10");
    conf.set("mapred.map.tasksperslot", "5");
    conf.set("mapred.tasktracker.map", "5");
    conf.set("mapred.map.parallel.copies", "5");

With all of those on, the number of map tasks running parallely remains 2.

What are the proper options to set to get the number of parallely running mappers up to 5?

like image 777
Arsen Zahray Avatar asked Sep 19 '11 21:09

Arsen Zahray


1 Answers

In the TaskTracker.java

maxCurrentMapTasks = conf.getInt("mapred.tasktracker.map.tasks.maximum", 2);

According to the "Hadoop : The Definitive Guide". So, setting the property on the client side is of no use. You need to set the same in the configuration file.

Be aware that some properties have no effect when set in the client configuration. For example, if in your job submission you set mapred.tasktracker.map.tasks.maximum with the expectation that it would change the number of task slots for the tasktrackers running your job, then you would be disappointed, since this property only is only honored if set in the tasktracker’s mapred-site.html file. In general, you can tell the component where a property should be set by its name, so the fact that mapred.task.tracker.map.tasks.maximum starts with mapred.tasktracker gives you a clue that it can be set only for the tasktracker daemon. This is not a hard and fast rule, however, so in some cases you may need to resort to trial and error, or even reading the source.

like image 121
Praveen Sripati Avatar answered Sep 26 '22 10:09

Praveen Sripati