Automatically set maximum number of map tasks per node to the number of cores?

Question

I'm working on setting up a hadoop cluster where the nodes are all fairly heterogeneous, i.e. they each have a different number of cores. Currently I have to manually edit the mapred-site.xml on each node to fill in {cores}:

<property>
    <name>mapred.tasktracker.map.tasks.maximum</name>
    <value>{cores}</value>
</property>

Is there an easier way to to this when I add new nodes? Most of the other values are some default and the maximum map tasks is the only thing that changes from node to node.

Chris White · Accepted Answer

If you're comfortable with some scripting then the following will give you the number of 'processors' for each machine (which mean different things to different architectures but is more or less what you want):

cat /proc/cpuinfo | grep processor | wc -l

Then you can use sed or some equivalent to update your mapred-site.xml file according to the output of this.

So putting this all together:

CORES=`cat /proc/cpuinfo | grep processor | wc -l`
sed -i "s/{cores}/$CORES/g" mapred-site.xml

A footnote, but you probably don't want to configure the number of mappers and the number of reducers each to the number of cores, more so that you probably want to split them between the two types, and have a core spare for data node and task tracker etc.

Automatically set maximum number of map tasks per node to the number of cores?

Tags:

hadoop

job

1 Answers

Chris White

Recent Activity

Donate For Us

Automatically set maximum number of map tasks per node to the number of cores?

Tags:

hadoop

job

1 Answers

Chris White

Related questions

Recent Activity

Donate For Us