According to http://blog.cloudera.com/blog/2014/04/apache-hadoop-yarn-avoiding-6-time-consuming-gotchas/, the formula for determining the number of concurrently running tasks per node is:
min (yarn.nodemanager.resource.memory-mb / mapreduce.[map|reduce].memory.mb,
yarn.nodemanager.resource.cpu-vcores / mapreduce.[map|reduce].cpu.vcores) .
However, on setting these parameters to (for a cluster of c3.2xlarges):
yarn.nodemanager.resource.memory-mb = 14336
mapreduce.map.memory.mb = 2048
yarn.nodemanager.resource.cpu-vcores = 8
mapreduce.map.cpu.vcores = 1,
I find I'm only getting up to 4 tasks running concurrently per node when the formula says 7 should be. What's the deal?
I'm running Hadoop 2.4.0 on AMI 3.1.0.
My empirical formula was incorrect. The formula provided by Cloudera is the correct one and appears to give the expected number of concurrently running tasks, at least on AMI 3.3.1.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With