Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Automatically set maximum number of map tasks per node to the number of cores?

Tags:

hadoop

I'm working on setting up a hadoop cluster where the nodes are all fairly heterogeneous, i.e. they each have a different number of cores. Currently I have to manually edit the mapred-site.xml on each node to fill in {cores}:

<property>
    <name>mapred.tasktracker.map.tasks.maximum</name>
    <value>{cores}</value>
</property>

Is there an easier way to to this when I add new nodes? Most of the other values are some default and the maximum map tasks is the only thing that changes from node to node.

like image 222
job Avatar asked Oct 21 '22 11:10

job


1 Answers

If you're comfortable with some scripting then the following will give you the number of 'processors' for each machine (which mean different things to different architectures but is more or less what you want):

cat /proc/cpuinfo | grep processor | wc -l

Then you can use sed or some equivalent to update your mapred-site.xml file according to the output of this.

So putting this all together:

CORES=`cat /proc/cpuinfo | grep processor | wc -l`
sed -i "s/{cores}/$CORES/g" mapred-site.xml

A footnote, but you probably don't want to configure the number of mappers and the number of reducers each to the number of cores, more so that you probably want to split them between the two types, and have a core spare for data node and task tracker etc.

like image 74
Chris White Avatar answered Nov 15 '22 09:11

Chris White