I was playing with distributed shell application (hadoop-2.0.0-cdh4.1.2). This is the error I'm receiving at the moment.
13/01/01 17:09:09 INFO distributedshell.Client: Got application report from ASM for, appId=5, clientToken=null, appDiagnostics=Application application_1357039792045_0005 failed 1 times due to AM Container for appattempt_1357039792045_0005_000001 exited with exitCode: 143 due to: Container [pid=24845,containerID=container_1357039792045_0005_01_000001] is running beyond virtual memory limits. Current usage: 77.8mb of 512.0mb physical memory used; 1.1gb of 1.0gb virtual memory used. Killing container.
Dump of the process-tree for container_1357039792045_0005_01_000001 :
|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
|- 24849 24845 24845 24845 (java) 165 12 1048494080 19590 /usr/java/bin/java -Xmx512m org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster --container_memory 128 --num_containers 1 --priority 0 --shell_command ping --shell_args localhost --debug
|- 24845 23394 24845 24845 (bash) 0 0 108654592 315 /bin/bash -c /usr/java/bin/java -Xmx512m org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster --container_memory 128 --num_containers 1 --priority 0 --shell_command ping --shell_args localhost --debug 1>/tmp/logs/application_1357039792045_0005/container_1357039792045_0005_01_000001/AppMaster.stdout 2>/tmp/logs/application_1357039792045_0005/container_1357039792045_0005_01_000001/AppMaster.stderr
The interesting part is that, there seems to be no problem with the setup, since a simple ls
or uname
command completed successfully and the output was available in the container2 stdout.
Regarding the setup, yarn.nodenamager.vmem-pmem-ratio
is 3
and the total physical memory available is 2GB, which I thinks is more than enough for example to run.
For the command in question, the "ping localhost" generated two replies, as it can be seen from the containerlogs/container_1357039792045_0005_01_000002/721917/stdout/?start=-4096.
So, what could be the problem?
No need to change the cluster configuration. I found out that just providing the extra parameter
-Dmapreduce.map.memory.mb=4096
to distcp helped for me.
If you are running Tez framework, it is must to set the below parameters in Tez-site.xml
tez.am.resource.memory.mb
tez.task.resource.memory.mb
tez.am.java.opts
And in Yarn-site.xml
yarn.nodemanager.resource.memory-mb
yarn.scheduler.minimum-allocation-mb
yarn.scheduler.maximum-allocation-mb
yarn.nodemanager.vmem-check-enabled
yarn.nodemanager.vmem-pmem-ratio
All these parameters are mandatory to set
From the error message, you can see that you're using more virtual memory than your current limit of 1.0gb. This can be resolved in two ways:
Disable Virtual Memory Limit Checking
YARN will simply ignore the limit; in order to do this, add this to your yarn-site.xml
:
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
<description>Whether virtual memory limits will be enforced for containers.</description>
</property>
The default for this setting is true
.
Increase Virtual Memory to Physical Memory Ratio
In your yarn-site.xml
change this to a higher value than is currently set
<property>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>5</value>
<description>Ratio between virtual memory to physical memory when setting memory limits for containers. Container allocations are expressed in terms of physical memory, and virtual memory usage is allowed to exceed this allocation by this ratio.</description>
</property>
The default is 2.1
You could also increase the amount of physical memory you allocate to a container.
Make sure you don't forget to restart yarn after you change the config.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With