I am running a Spark job using Scala, but it gets stuck not executing and tasks by my worker nodes.
Currently I am submitting this to Livy, which submits to our Spark Cluster with 8 cores and 12GB of RAM with the following configuration:
data={
'file': bar_jar.format(bucket_name),
'className': 'com.bar.me',
'jars': [
common_jar.format(bucket_name),
],
'args': [
bucket_name,
spark_master,
data_folder
],
'name': 'Foo',
'driverMemory': '2g',
'executorMemory': '9g',
'driverCores': 1,
'executorCores': 1,
'conf': {
'spark.driver.memoryOverhead': '200',
'spark.executor.memoryOverhead': '200',
'spark.submit.deployMode': 'cluster'
}
}
The node logs then are endlessly filled with:
2019-03-29T22:24:32.119+0000: [GC (Allocation Failure) 2019-03-29T22:24:32.119+0000:
[ParNew: 68873K->20K(77440K), 0.0012329 secs] 257311K->188458K(349944K),
0.0012892 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
The issue is that the next stages & tasks are not executing, so the behavior is quite unexpected.
Allocation Failure happens when there isn't enough free space to create new objects in Young generation. Allocation failures triggers minor GC (to do a minor collection) to free up space in the heap for the allocation request. With minor GC, space is freed for the new allocation to to be made in the young generation.
A GC allocation failure means that the garbage collector could not move objects from young gen to old gen fast enough because it does not have enough memory in old gen. This can cause application slowness.
To avoid full GC in G1 GC, there are two commonly-used approaches: Decrease the InitiatingHeapOccupancyPercent value ( default is 45), to let G1 GC starts initial concurrent marking at an earlier time, so that we have higher chances to avoid full GC.
Determine the memory resources available for the Spark application. Multiply the cluster RAM size by the YARN utilization percentage. Provides 5 GB RAM for available drivers and 50 GB RAM available for worker nodes. Discount 1 core per worker node to determine the executor core instances.
It is apparently a normal GC event:
This ‘Allocation failure’ log is not an error but is a totally normal case in JVM. This is a typical GC event which causes the Java Garbage Collection process to get triggered. Garbage Collection removes dead objects, compact reclaimed memory and thus helps in freeing up memory for new object allocations.
Source: https://medium.com/@technospace/gc-allocation-failures-42c68e8e5e04
Edit: If the next stages are not executing, maybe you should check stderr
instead of stdout
.
The following link provides a description on how to allocate executor memory
https://aws.amazon.com/blogs/big-data/best-practices-for-successfully-managing-memory-for-apache-spark-applications-on-amazon-emr/
I found it very useful , but found that the following parameters
needs to be updated as per our application requirements
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With