Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the maximum container(s) in a single-node cluster (hadoop)?

I am new in hadoop and i am not yet familiar to its configuration.

I just want to ask the maximum container per node.

I am using a single node cluster (6GB ram laptop)

and below is my mapred and yarn configuration:

**mapred-site.xml**
map-mb : 4096 opts:-Xmx3072m
reduce-mb : 8192 opts:-Xmx6144m

**yarn-site.xml**
resource memory-mb : 40GB
min allocation-mb : 1GB

The above setup can only run 4 to 5 jobs. and max of 8 container.

like image 662
Andoy Abarquez Avatar asked Oct 24 '14 02:10

Andoy Abarquez


People also ask

What is single node cluster in Hadoop?

A single node cluster means only one DataNode running and setting up all the NameNode, DataNode, ResourceManager, and NodeManager on a single machine. This is used for studying and testing purposes.

What is Hadoop cluster container?

Container represents an allocated resource in the cluster. The ResourceManager is the sole authority to allocate any Container to applications. The allocated Container is always on a single node and has a unique ContainerId . It has a specific amount of Resource allocated.

How can the size of a Hadoop cluster be increased?

The most common practice to size a Hadoop cluster is sizing the cluster based on the amount of storage required. The more data into the system, the more will be the machines required. Each time you add a new node to the cluster, you get more computing resources in addition to the new storage capacity.

What are the 3 build of Hadoop?

Apache Hadoop (HDFS and YARN) Apache HBase. Apache Spark.


1 Answers

Maximum containers that run on a single NodeManager (hadoop worker) depends on lot of factors like how much memory is assigned for the NodeManager to use and also depends on application specific requirements.

The defaults for yarn.scheduler.*-allocation-* are: 1GB (minimum allocation), 8GB (maximum allocation), 1 core and 32 cores. So, minimum and maximum allocation, affects number of containers per node.

So, if you have 6GB RAM and 4 virtual cores, here is how the YARN configuration should look like:

yarn.scheduler.minimum-allocation-mb: 128
yarn.scheduler.maximum-allocation-mb: 2048
yarn.scheduler.minimum-allocation-vcores: 1
yarn.scheduler.maximum-allocation-vcores: 2
yarn.nodemanager.resource.memory-mb: 4096
yarn.nodemanager.resource.cpu-vcores: 4

The above configuration tells hadoop to use atmost 4GB and 4 virtual cores and that each container can have between 128 MB and 2 GB of memory and between 1 and 2 virtual cores, with these settings you could run upto 2 containers with maximum resources at a time.

Now, for MapReduce specific configuration:

yarn.app.mapreduce.am.resource.mb: 1024
yarn.app.mapreduce.am.command-opts: -Xmx768m
mapreduce.[map|reduce].cpu.vcores: 1
mapreduce.[map|reduce].memory.mb: 1024
mapreduce.[map|reduce].java.opts: -Xmx768m

With this configuration, you could theoretically have up to 4 mappers/reducers running simultaneously in 4 1GB containers. In practice, the MapReduce application master will use a 1GB container so the actual number of concurrent mappers and reducers will be limited to 3. You can play around with the memory limits but it might require some experimentation to find the best ones.

As a rule of thumb, you should limit the heap-size to about 75% of the total memory available to ensure things run more smoothly.

You could also set memory per container using yarn.scheduler.minimum-allocation-mb property.

For more detail configuration for production systems use this document from hortonworks as a reference.

like image 142
Ashrith Avatar answered Nov 15 '22 03:11

Ashrith