Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to schedule Hadoop Map tasks in multi-core 8 node cluster?

I have a "map only" (no reduce phase) program. The size of input file is large enough to create 7 map tasks and I have verified that by looking the output produced (part-000 to part006) . Now, my cluster has 8 nodes each with 8 cores and 8 GB of memory and shared filesystem hosted at head node.

My question is can I choose between running all the 7 map tasks in 1 node only or running the 7 map tasks in 7 different slave nodes (1 task per node). If I can do so, then what change in my code and configuration file is needed.

I tried setting the parameter "mapred.tasktracker.map.tasks.maximum" to 1 and 7 in my code only but I didnot find any appreciable time difference. In my configuration file it is set as 1.

like image 280
justin waugh Avatar asked Apr 29 '12 15:04

justin waugh


People also ask

What is Hadoop MapReduce?

Hadoop MapReduce is a software framework for easily writing applications which process vast amounts of data (multi-terabyte data-sets) in-parallel on large clusters (thousands of nodes) of commodity hardware in a reliable, fault-tolerant manner.

What is MapReduce algorithm?

MapReduce implements various mathematical algorithms to divide a task into small parts and assign them to multiple systems. In technical terms, MapReduce algorithm helps in sending the Map & Reduce tasks to appropriate servers in a cluster. These mathematical algorithms may include the following − Sorting. Searching.


1 Answers

"mapred.tasktracker.map.tasks.maximum" deals with the number of map tasks that should be launched on each node, not the number of nodes to be used for each map task. In the Hadoop architecture, there is 1 tasktracker for each node (slaves) and 1 job tracker on a master node (master). So if you set the property mapred.tasktracker.map.tasks.maximum, it will only change the number of map tasks to be executed per node. The range of "mapred.tasktracker.map.tasks.maximum" is from 1/2*cores/node to 2*cores/node

The number of map tasks that you want overall should be set using setNumMapTasks(int)

like image 175
Chaos Avatar answered Nov 01 '22 15:11

Chaos