Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to set the VCORES in hadoop mapreduce/yarn?

The following are my configuration :

**mapred-site.xml**
map-mb : 4096 opts:-Xmx3072m
reduce-mb : 8192 opts:-Xmx6144m

**yarn-site.xml**
resource memory-mb : 40GB
min allocation-mb : 1GB

the Vcores in hadoop cluster displayed 8GB but i dont know how the computation or where to configure it.

hope someone could help me.

like image 404
Andoy Abarquez Avatar asked Oct 23 '14 07:10

Andoy Abarquez


People also ask

What is Vcores in YARN?

As of Hadoop 2.4, YARN introduced the concept of vcores (virtual cores). A vcore is a share of host CPU that the YARN Node Manager allocates to available resources. yarn. scheduler. maximum-allocation-vcores is the maximum allocation for each container request at the Resource Manager, in terms of virtual CPU cores.

What is YARN NodeManager CPU Vcores?

yarn. nodemanager. resource. cpu-vcores : specifies the number of virtual CPUs that a Node Manager can use to create containers when the Resource Manager requests container building. By default it is -1.

How does YARN allocate memory?

YARN uses the MB of memory and virtual cores per node to allocate and track resource usage. For example, a 5 node cluster with 12 GB of memory allocated per node for YARN has a total memory capacity of 60GB. For a default 2GB container size, YARN has room to allocate 30 containers of 2GB each.

How Hadoop runs a MapReduce job using YARN?

Yarn node manager: In a cluster, it monitors and launches the compute containers on machines. Yarn resource manager: Handles the allocation of computing resources coordination on the cluster. MapReduce application master Facilitates the tasks running the MapReduce work.


1 Answers

Short Answer

It most probably doesn't matter, if you are just running hadoop out of the box on your single-node-cluster or even a small personal distributed cluster. You just need to worry about memory.

Long Answer

vCores are used for larger clusters in order to limit CPU for different users or applications. If you are using YARN for yourself there is no real reason to limit your container CPU. That is why vCores are not even taken into consideration by default in Hadoop !

Try setting your available nodemanager vcores to 1. It doesn't matter ! Your number of containers will still be 2 or 4 .. or whatever the value of :

yarn.nodemanager.resource.memory-mb / mapreduce.[map|reduce].memory.mb

If really do want the number of containers to take vCores into consideration and be limited by :

yarn.nodemanager.resource.cpu-vcores / mapreduce.[map|reduce].cpu.vcores

then you need to use a different a different Resource Calculator. Go to your capacity-scheduler.xml config and change DefaultResourceCalculator to DominantResourceCalculator.

In addition to using vCores for container allocation, you want to use vCores to really limit CPU usage of each node ? You need to change even more configurations to use the LinuxContainerExecutor instead of the DefaultContainerExecutor, because it can manage linux cgroups which are used to limit CPU resources. Follow this page if you want more info on this.

like image 140
Nicomak Avatar answered Sep 18 '22 01:09

Nicomak