I've searched and not finding much information related to Hadoop Datanode processes dying due to GC overhead limit exceeded, so I thought I'd post a question.
We are running a test where we need to confirm our Hadoop cluster can handle having ~3million files stored on it (currently a 4 node cluster). We are using a 64bit JVM and we've allocated 8g to the namenode. However, as my test program writes more files to DFS, the datanodes start dying off with this error: Exception in thread "DataNode: [/var/hadoop/data/hadoop/data]" java.lang.OutOfMemoryError: GC overhead limit exceeded
I saw some posts about some options (parallel GC?) I guess which can be set in hadoop-env.sh but I'm not too sure of the syntax and I'm kind of a newbie, so I didn't quite grok how it's done. Thanks for any help here!
OutOfMemoryError: GC overhead limit exceeded" error indicates that the NameNode heap size is insufficient for the amount of HDFS data in the cluster. Increase the heap size to prevent out-of-memory exceptions.
From the root of the Eclipse folder open the eclipse. ini and change the default maximum heap size of -Xmx256m to -Xmx1024m on the last line. NOTE: If there is a lot of memory available on the machine, you can also try using -Xmx2048m as the maximum heap size.
"GC overhead limit exceeded" message is something which cannot be truly removed by increasing the available memory. Rather GC should be put into a different mode (perhaps event different than suggested by me) to handle the situation properly.
Try to increase the memory for datanode by using this: (hadoop restart required for this to work)
export HADOOP_DATANODE_OPTS="-Xmx10g"
This will set the heap to 10gb...you can increase as per your need.
You can also paste this at the start in $HADOOP_CONF_DIR/hadoop-env.sh
file.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With