I am new in hadoop and i am not yet familiar to its configuration.
I just want to ask the maximum container per node.
I am using a single node cluster (6GB ram laptop)
and below is my mapred and yarn configuration:
**mapred-site.xml**
map-mb : 4096 opts:-Xmx3072m
reduce-mb : 8192 opts:-Xmx6144m
**yarn-site.xml**
resource memory-mb : 40GB
min allocation-mb : 1GB
The above setup can only run 4 to 5 jobs. and max of 8 container.
A single node cluster means only one DataNode running and setting up all the NameNode, DataNode, ResourceManager, and NodeManager on a single machine. This is used for studying and testing purposes.
Container represents an allocated resource in the cluster. The ResourceManager is the sole authority to allocate any Container to applications. The allocated Container is always on a single node and has a unique ContainerId . It has a specific amount of Resource allocated.
The most common practice to size a Hadoop cluster is sizing the cluster based on the amount of storage required. The more data into the system, the more will be the machines required. Each time you add a new node to the cluster, you get more computing resources in addition to the new storage capacity.
Apache Hadoop (HDFS and YARN) Apache HBase. Apache Spark.
Maximum containers that run on a single NodeManager (hadoop worker) depends on lot of factors like how much memory is assigned for the NodeManager to use and also depends on application specific requirements.
The defaults for yarn.scheduler.*-allocation-*
are: 1GB (minimum allocation), 8GB (maximum allocation), 1 core and 32 cores. So, minimum and maximum allocation, affects number of containers per node.
So, if you have 6GB RAM and 4 virtual cores, here is how the YARN configuration should look like:
yarn.scheduler.minimum-allocation-mb: 128
yarn.scheduler.maximum-allocation-mb: 2048
yarn.scheduler.minimum-allocation-vcores: 1
yarn.scheduler.maximum-allocation-vcores: 2
yarn.nodemanager.resource.memory-mb: 4096
yarn.nodemanager.resource.cpu-vcores: 4
The above configuration tells hadoop to use atmost 4GB and 4 virtual cores and that each container can have between 128 MB and 2 GB of memory and between 1 and 2 virtual cores, with these settings you could run upto 2 containers with maximum resources at a time.
Now, for MapReduce specific configuration:
yarn.app.mapreduce.am.resource.mb: 1024
yarn.app.mapreduce.am.command-opts: -Xmx768m
mapreduce.[map|reduce].cpu.vcores: 1
mapreduce.[map|reduce].memory.mb: 1024
mapreduce.[map|reduce].java.opts: -Xmx768m
With this configuration, you could theoretically have up to 4 mappers/reducers running simultaneously in 4 1GB containers. In practice, the MapReduce application master will use a 1GB container so the actual number of concurrent mappers and reducers will be limited to 3. You can play around with the memory limits but it might require some experimentation to find the best ones.
As a rule of thumb, you should limit the heap-size to about 75% of the total memory available to ensure things run more smoothly.
You could also set memory per container using yarn.scheduler.minimum-allocation-mb
property.
For more detail configuration for production systems use this document from hortonworks as a reference.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With