Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to set the precise max number of concurrently running tasks per node in Hadoop 2.4.0 on Elastic MapReduce

According to http://blog.cloudera.com/blog/2014/04/apache-hadoop-yarn-avoiding-6-time-consuming-gotchas/, the formula for determining the number of concurrently running tasks per node is:

min (yarn.nodemanager.resource.memory-mb / mapreduce.[map|reduce].memory.mb, 
     yarn.nodemanager.resource.cpu-vcores / mapreduce.[map|reduce].cpu.vcores) .

However, on setting these parameters to (for a cluster of c3.2xlarges):

yarn.nodemanager.resource.memory-mb = 14336

mapreduce.map.memory.mb = 2048

yarn.nodemanager.resource.cpu-vcores = 8

mapreduce.map.cpu.vcores = 1,

I find I'm only getting up to 4 tasks running concurrently per node when the formula says 7 should be. What's the deal?

I'm running Hadoop 2.4.0 on AMI 3.1.0.

like image 484
verve Avatar asked Aug 07 '14 22:08

verve


1 Answers

My empirical formula was incorrect. The formula provided by Cloudera is the correct one and appears to give the expected number of concurrently running tasks, at least on AMI 3.3.1.

like image 71
verve Avatar answered Sep 23 '22 22:09

verve