GC Logs Overwritten when JVM Crashes

Question

I'm tuning our product for G1GC, and as part of that testing, I'm experiencing regular segfaults on my Spark Workers, which of course causes the JVM to crash. When this happens, the Spark Worker/Executor JVM automagically restarts itself, which then overwrites the GC logs that were written for the previous Executor JVM.

To be honest, I'm not quite sure the mechanism for how the Executor JVM restarts itself, but I launch the Spark Driver service via init.d, which in turn calls off to a bash script. I do use a timestamp in that script that gets appended to the GC log filename:

today=$(date +%Y%m%dT%H%M%S%3N)

SPARK_HEAP_DUMP="-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=${SPARK_LOG_HOME}/heapdump_$$_${today}.hprof"
SPARK_GC_LOGS="-Xloggc:${SPARK_LOG_HOME}/gc_${today}.log -XX:LogFile=${SPARK_LOG_HOME}/safepoint_${today}.log"

GC_OPTS="-XX:+UnlockDiagnosticVMOptions -XX:+LogVMOutput -XX:+PrintFlagsFinal -XX:+PrintJNIGCStalls -XX:+PrintTLAB -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=15 -XX:GCLogFileSize=48M -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationConcurrentTime -XX:+PrintGCApplicationStoppedTime -XX:+PrintAdaptiveSizePolicy -XX:+PrintHeapAtGC -XX:+PrintGCCause -XX:+PrintReferenceGC -XX:+PrintSafepointStatistics -XX:PrintSafepointStatisticsCount=1"

I think the problem is that this script sends these options along to the Spark Driver, which then passes them off to the Spark Executors (via the -Dspark.executor.extraJavaOptions argument), which are all separate servers, and when the Executor JVM crashes, it simply uses the command that had been originally sent to start back up, which would mean that the timestamp portion of the GC log filename is static:

SPARK_STANDALONE_OPTS=`property ${SPARK_APP_CONFIG}/spark.properties "spark-standalone.extra.args"`
SPARK_STANDALONE_OPTS="$SPARK_STANDALONE_OPTS $GC_OPTS $SPARK_GC_LOGS $SPARK_HEAP_DUMP"

exec java ${SPARK_APP_HEAP_DUMP} ${GC_OPTS} ${SPARK_APP_GC_LOGS} \
    ${DRIVER_JAVA_OPTIONS} \
    -Dspark.executor.memory=${EXECUTOR_MEMORY} \
    -Dspark.executor.extraJavaOptions="${SPARK_STANDALONE_OPTS}" \
    -classpath ${CLASSPATH} \
    com.company.spark.Main >> ${SPARK_APP_LOGDIR}/${SPARK_APP_LOGFILE} 2>&1 &

This is making it difficult for me to debug the cause of the segfaults, since I'm losing the activity and state of the Workers that led up to the JVM crash. Any ideas for how I can handle this situation and keep the GC logs on the Workers, even after a JVM crash/segfault?

Ioannis Baourdos · Accepted Answer

If you are using Java 8 and above, you may consider getting away with it by adding %p to the log file name to introduce the PID which will be kind of unique per crash.

GC Logs Overwritten when JVM Crashes

Tags:

java

garbage-collection

apache-spark

g1gc

liltitus27

1 Answers

Ioannis Baourdos

Recent Activity

Donate For Us

GC Logs Overwritten when JVM Crashes

Tags:

java

garbage-collection

apache-spark

g1gc

liltitus27

1 Answers

Ioannis Baourdos

Related questions

Recent Activity

Donate For Us