Container is running beyond memory limits

Tags:

In Hadoop v1, I have assigned each 7 mapper and reducer slot with size of 1GB, my mappers & reducers runs fine. My machine has 8G memory, 8 processor. Now with YARN, when run the same application on the same machine, I got container error. By default, I have this settings:

  <property>     <name>yarn.scheduler.minimum-allocation-mb</name>     <value>1024</value>   </property>   <property>     <name>yarn.scheduler.maximum-allocation-mb</name>     <value>8192</value>   </property>   <property>     <name>yarn.nodemanager.resource.memory-mb</name>     <value>8192</value>   </property>

It gave me error:

Container [pid=28920,containerID=container_1389136889967_0001_01_000121] is running beyond virtual memory limits. Current usage: 1.2 GB of 1 GB physical memory used; 2.2 GB of 2.1 GB virtual memory used. Killing container.

I then tried to set memory limit in mapred-site.xml:

  <property>     <name>mapreduce.map.memory.mb</name>     <value>4096</value>   </property>   <property>     <name>mapreduce.reduce.memory.mb</name>     <value>4096</value>   </property>

But still getting error:

Container [pid=26783,containerID=container_1389136889967_0009_01_000002] is running beyond physical memory limits. Current usage: 4.2 GB of 4 GB physical memory used; 5.2 GB of 8.4 GB virtual memory used. Killing container.

I'm confused why the the map task need this much memory. In my understanding, 1GB of memory is enough for my map/reduce task. Why as I assign more memory to container, the task use more? Is it because each task gets more splits? I feel it's more efficient to decrease the size of container a little bit and create more containers, so that more tasks are running in parallel. The problem is how can I make sure each container won't be assigned more splits than it can handle?

532

asked Jan 08 '14 20:01

Lishu

1 Answers

You should also properly configure the maximum memory allocations for MapReduce. From this HortonWorks tutorial:

[...]

Each machine in our cluster has 48 GB of RAM. Some of this RAM should be >reserved for Operating System usage. On each node, we’ll assign 40 GB RAM for >YARN to use and keep 8 GB for the Operating System

For our example cluster, we have the minimum RAM for a Container (yarn.scheduler.minimum-allocation-mb) = 2 GB. We’ll thus assign 4 GB for Map task Containers, and 8 GB for Reduce tasks Containers.

In mapred-site.xml:

mapreduce.map.memory.mb: 4096

mapreduce.reduce.memory.mb: 8192

Each Container will run JVMs for the Map and Reduce tasks. The JVM heap size should be set to lower than the Map and Reduce memory defined above, so that they are within the bounds of the Container memory allocated by YARN.

In mapred-site.xml:

mapreduce.map.java.opts: -Xmx3072m

mapreduce.reduce.java.opts: -Xmx6144m

The above settings configure the upper limit of the physical RAM that Map and Reduce tasks will use.

To sum it up:

In YARN, you should use the mapreduce configs, not the mapred ones. EDIT: This comment is not applicable anymore now that you've edited your question.
What you are configuring is actually how much you want to request, not what is the max to allocate.
The max limits are configured with the java.opts settings listed above.

Finally, you may want to check this other SO question that describes a similar problem (and solution).

155

answered Oct 04 '22 12:10

cabad

Related questions
                            
                                Why is there no 'hadoop fs -head' shell command?
                            
                                Hive insert query like SQL
                            
                                Write to multiple outputs by key Spark - one Spark job
                            
                                Hive: how to show all partitions of a table?
                            
                                HDFS error: could only be replicated to 0 nodes, instead of 1
                            
                                Integration testing Hive jobs
                            
                                How to Delete a directory from Hadoop cluster which is having comma(,) in its name?
                            
                                Differences between Amazon S3 and S3n in Hadoop
                            
                                How to delete and update a record in Hive
                            
                                What is Hive: Return Code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
                            
                                Is there any way to get the column name along with the output while execute any query in Hive?
                            
                                Buiding Hadoop with Eclipse / Maven - Missing artifact jdk.tools:jdk.tools:jar:1.6
                            
                                Where does Hive store files in HDFS?
                            
                                merge output files after reduce phase
                            
                                hadoop copy a local file system folder to HDFS
                            
                                Hadoop truncated/inconsistent counter name
                            
                                How to check if ZooKeeper is running or up from command prompt?
                            
                                When do reduce tasks start in Hadoop?
                            
                                How do I output the results of a HiveQL query to CSV?
                            
                                Large scale data processing Hbase vs Cassandra [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Container is running beyond memory limits

Tags:

hadoop

mapreduce

hadoop-yarn

mrv2

Lishu

People also ask

1 Answers

cabad

Recent Activity

Donate For Us