Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hadoop Yarn Container Does Not Allocate Enough Space

Tags:

hadoop

I'm running a Hadoop job, and in my yarn-site.xml file, I have the following configuration:

    <property>
            <name>yarn.scheduler.minimum-allocation-mb</name>
            <value>2048</value>
    </property>
    <property>
            <name>yarn.scheduler.maximum-allocation-mb</name>
            <value>4096</value>
    </property>

However, I still occasionally get the following error:

Container [pid=63375,containerID=container_1388158490598_0001_01_000003] is running beyond physical memory limits. Current usage: 2.0 GB of 2 GB physical memory used; 2.8 GB of 4.2 GB virtual memory used. Killing container.

I've found that by increasing yarn.scheduler.minimum-allocation-mb, the physical memory allocated for the container goes up. However, I don't always want 4GB being allocated for my container, and thought that by explicitly specifying a maximum size, I'd be able to go around this problem. I realize that Hadoop can't figure out how much memory it needs to allocate for the container before the mapper runs, so how should I go about allocating more memory for the container only if it needs that extra memory?

like image 801
Olshansk Avatar asked Dec 27 '13 15:12

Olshansk


People also ask

How does yarn allocate memory?

YARN uses the MB of memory and virtual cores per node to allocate and track resource usage. For example, a 5 node cluster with 12 GB of memory allocated per node for YARN has a total memory capacity of 60GB. For a default 2GB container size, YARN has room to allocate 30 containers of 2GB each.

How many containers does yarn allocate to a MapReduce application?

MapReduce requests three different kinds of containers from YARN: the application master container, map containers, and reduce containers. For each container type, there is a corresponding set of properties that can be used to set the resources requested.

What is reserved memory in yarn?

A container will become reserved state when the container is assigned to some nodemanager node which do not have enough resource(cpu or memory) for it.


1 Answers

You should also properly configure the memory allocations for MapReduce. From this HortonWorks tutorial:

[...]

For our example cluster, we have the minimum RAM for a Container (yarn.scheduler.minimum-allocation-mb) = 2 GB. We’ll thus assign 4 GB for Map task Containers, and 8 GB for Reduce tasks Containers.

In mapred-site.xml:

mapreduce.map.memory.mb: 4096

mapreduce.reduce.memory.mb: 8192

Each Container will run JVMs for the Map and Reduce tasks. The JVM heap size should be set to lower than the Map and Reduce memory defined above, so that they are within the bounds of the Container memory allocated by YARN.

In mapred-site.xml:

mapreduce.map.java.opts: -Xmx3072m

mapreduce.reduce.java.opts: -Xmx6144m

The above settings configure the upper limit of the physical RAM that Map and Reduce tasks will use.

Finally, someone in this thread in the Hadoop mailing list had the same problem and in their case, it turned out they had a memory leak in their code.

like image 119
cabad Avatar answered Oct 17 '22 18:10

cabad