Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hadoop - increasing map tasks in xml doesn't increases map tasks when runs

I added the following in my conf/mapred-site.xml

<property>
  <name>mapred.tasktracker.map.tasks.maximum</name>
  <value>4</value>
</property>

<property>
  <name>mapred.tasktracker.reduce.tasks.maximum</name>
  <value>1</value>
</property>

But when I run the job, its still runs 2 maps(which is default one)? How can I force this number to increase?

P.S. I am using Ubuntu Quad core box

Thank you

like image 488
daydreamer Avatar asked Oct 07 '11 23:10

daydreamer


2 Answers

Are you running over a small amount of data? It could be that your MapReduce job is running over only one input split and thus does not require more mappers. Try running your job over hundreds of MB of data instead and see if you still have the same issue.

The maximum number of tasks able to run on a single node has nothing to do with the number of map tasks a job has. Your job could be 20 map tasks, while your cluster has 5 map slots, and it will just take longer. Or, your cluster could have 50 map slots, but your job only have 2 map slots.

like image 74
Donald Miner Avatar answered Sep 23 '22 01:09

Donald Miner


mapred.tasktracker.map.tasks.maximum is the maximum number of tasks a tasktracker can run simultaneously. But when you want to set the number of map tasks for a job as a whole, set mapred.map.tasks to 4.

like image 33
saiyan Avatar answered Sep 25 '22 01:09

saiyan