Spark - Container is running beyond physical memory limits

Tags:

I have a cluster of two worker nodes. Worker_Node_1 - 64GB RAM Worker_Node_2 - 32GB RAM

Background Summery : I am trying to execute spark-submit on yarn-cluster to run Pregel on a Graph to calculate the shortest path distances from one source vertex to all other vertices and print the values on console. Experment :

For Small graph with 15 vertices execution completes application final status : SUCCEEDED
My code works perfectly and prints shortest distance for 241 vertices graph for single vertex as source vertex but there is a problem.

Problem : When I dig into the Log file the task gets complete successfully in 4 mins and 26 Secs but still on the terminal it keeps on showing application status as Running and after approx 12 more minutes task execution terminates saying -

Application application_1447669815913_0002 failed 2 times due to AM Container for appattempt_1447669815913_0002_000002 exited with exitCode: -104 For more detailed output, check application tracking page:http://myserver.com:8088/proxy/application_1447669815913_0002/
Then, click on links to logs of each attempt. 
Diagnostics: Container [pid=47384,containerID=container_1447669815913_0002_02_000001] is running beyond physical memory limits. Current usage: 17.9 GB of 17.5 GB physical memory used; 18.7 GB of 36.8 GB virtual memory used. Killing container.

Dump of the process-tree for container_1447669815913_0002_02_000001 : 
 |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
|- 47387 47384 47384 47384 (java) 100525 13746 20105633792 4682973 /usr/lib/jvm/java-7-oracle-cloudera/bin/java -server -Xmx16384m -Djava.io.tmpdir=/yarn/nm/usercache/cloudera/appcache/application_1447669815913_0002/container_1447669815913_0002_02_000001/tmp -Dspark.eventLog.enabled=true -Dspark.eventLog.dir=hdfs://myserver.com:8020/user/spark/applicationHistory -Dspark.executor.memory=14g -Dspark.shuffle.service.enabled=false -Dspark.yarn.executor.memoryOverhead=2048 -Dspark.yarn.historyServer.address=http://myserver.com:18088 -Dspark.driver.extraLibraryPath=/opt/cloudera/parcels/CDH-5.4.7-1.cdh5.4.7.p0.3/lib/hadoop/lib/native -Dspark.shuffle.service.port=7337 -Dspark.yarn.jar=local:/opt/cloudera/parcels/CDH-5.4.7-1.cdh5.4.7.p0.3/lib/spark/lib/spark-assembly.jar -Dspark.serializer=org.apache.spark.serializer.KryoSerializer -Dspark.authenticate=false -Dspark.app.name=com.path.PathFinder -Dspark.master=yarn-cluster -Dspark.executor.extraLibraryPath=/opt/cloudera/parcels/CDH-5.4.7-1.cdh5.4.7.p0.3/lib/hadoop/lib/native -Dspark.yarn.am.extraLibraryPath=/opt/cloudera/parcels/CDH-5.4.7-1.cdh5.4.7.p0.3/lib/hadoop/lib/native -Dspark.yarn.app.container.log.dir=/var/log/hadoop-yarn/container/application_1447669815913_0002/container_1447669815913_0002_02_000001 org.apache.spark.deploy.yarn.ApplicationMaster --class com.path.PathFinder --jar file:/home/cloudera/Documents/Longest_Path_Data_1/Jars/ShortestPath_Loop-1.0.jar --arg /home/cloudera/workspace/Spark-Integration/LongestWorstPath/configFile --executor-memory 14336m --executor-cores 32 --num-executors 2
|- 47384 47382 47384 47384 (bash) 2 0 17379328 853 /bin/bash -c LD_LIBRARY_PATH=/opt/cloudera/parcels/CDH-5.4.7-1.cdh5.4.7.p0.3/lib/hadoop/lib/native::/opt/cloudera/parcels/CDH-5.4.7-1.cdh5.4.7.p0.3/lib/hadoop/lib/native /usr/lib/jvm/java-7-oracle-cloudera/bin/java -server -Xmx16384m -Djava.io.tmpdir=/yarn/nm/usercache/cloudera/appcache/application_1447669815913_0002/container_1447669815913_0002_02_000001/tmp '-Dspark.eventLog.enabled=true' '-Dspark.eventLog.dir=hdfs://myserver.com:8020/user/spark/applicationHistory' '-Dspark.executor.memory=14g' '-Dspark.shuffle.service.enabled=false' '-Dspark.yarn.executor.memoryOverhead=2048' '-Dspark.yarn.historyServer.address=http://myserver.com:18088' '-Dspark.driver.extraLibraryPath=/opt/cloudera/parcels/CDH-5.4.7-1.cdh5.4.7.p0.3/lib/hadoop/lib/native' '-Dspark.shuffle.service.port=7337' '-Dspark.yarn.jar=local:/opt/cloudera/parcels/CDH-5.4.7-1.cdh5.4.7.p0.3/lib/spark/lib/spark-assembly.jar' '-Dspark.serializer=org.apache.spark.serializer.KryoSerializer' '-Dspark.authenticate=false' '-Dspark.app.name=com.path.PathFinder' '-Dspark.master=yarn-cluster' '-Dspark.executor.extraLibraryPath=/opt/cloudera/parcels/CDH-5.4.7-1.cdh5.4.7.p0.3/lib/hadoop/lib/native' '-Dspark.yarn.am.extraLibraryPath=/opt/cloudera/parcels/CDH-5.4.7-1.cdh5.4.7.p0.3/lib/hadoop/lib/native' -Dspark.yarn.app.container.log.dir=/var/log/hadoop-yarn/container/application_1447669815913_0002/container_1447669815913_0002_02_000001 org.apache.spark.deploy.yarn.ApplicationMaster --class 'com.path.PathFinder' --jar file:/home/cloudera/Documents/Longest_Path_Data_1/Jars/ShortestPath_Loop-1.0.jar --arg '/home/cloudera/workspace/Spark-Integration/LongestWorstPath/configFile' --executor-memory 14336m --executor-cores 32 --num-executors 2 1> /var/log/hadoop-yarn/container/application_1447669815913_0002/container_1447669815913_0002_02_000001/stdout 2> /var/log/hadoop-yarn/container/application_1447669815913_0002/container_1447669815913_0002_02_000001/stderr
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Failing this attempt. Failing the application.

Things I tried :

yarn.schedular.maximum-allocation-mb – 32GB
mapreduce.map.memory.mb = 2048 (Previously it was 1024)
Tried varying --driver-memory upto 24g

Could you please put more color on to how I can configure the Resource Manager so that Large Size Graphs ( > 300K vertices) can also be processed? Thanks.

639

asked Nov 17 '15 14:11

aditya

3 Answers

Just increase default conf of spark.driver.memory from 512m to 2g solve this error in my case.

You may set the memory to higher if it keeps hitting the same error. Then, you can keep reducing it until it hits the same error so that you know the optimum driver memory to use for your job.

119

answered Oct 04 '22 13:10

Wong Tat Yau

The more data you are processing, the more memory is needed by each Spark task. And if your executor is running too many tasks then it can run out of memory. When I had problems processing large amounts of data, it usually was a result of not properly balancing the number of cores per executor. Try to either reduce the number of cores or increase the executor memory.

One easy way to tell that you are having memory issues is to check the Executor tab on the Spark UI. If you see a lot of red bars indicating high garbage collection time, you are probably running out of memory in your executors.

answered Oct 04 '22 14:10

Ted

I slove the error in my case to increase conf of spark.yarn.executor.memoryOverhead Which stand for off-heap memory When you increase the amount of driver-memory and executor-memory, do not forget this config item

answered Oct 04 '22 14:10

Gavin Gu

Related questions
                            
                                Difference between `load data inpath ` and `location` in hive?
                            
                                Meaning of fs.defaultFS property in core-site.xml in hadoop
                            
                                How to use two versions of spark shell?
                            
                                Repair HBase table (unassigned region in transition)
                            
                                get "ERROR: Can't get master address from ZooKeeper; znode data == null" when using Hbase shell
                            
                                Get a yarn configuration from commandline
                            
                                Spark: Inconsistent performance number in scaling number of cores
                            
                                On what basis mapreduce framework decides whether to launch a combiner or not
                            
                                Could not find or load main class org.apache.hadoop.hdfs.server.namenode.Namenode
                            
                                no namenode error in pseudo-mode
                            
                                Hive creating a table but getting FAILED: SemanticException [Error 10035]: Column repeated in partitioning columns
                            
                                High throughput vs low latency in HDFS
                            
                                terminating a spark step in aws
                            
                                Hadoop: compress file in HDFS?
                            
                                How to delete duplicate records from Hive table?
                            
                                Wiping out DFS in Hadoop
                            
                                Hadoop FileSystem closed exception when doing BufferedReader.close()
                            
                                Redux: How do I get Jython to use Python modules stored in Lib within its own jar file when running in Hadoop?
                            
                                HBase & Mahout - Using HBase as a Datastore/source for Mahout - Classification

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Spark - Container is running beyond physical memory limits

Tags:

apache-spark

hadoop

spark-graphx

aditya

People also ask

3 Answers

Wong Tat Yau

Ted

Gavin Gu

Recent Activity

Donate For Us