Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark nodes keep printing GC (Allocation Failure) and no tasks run

I am running a Spark job using Scala, but it gets stuck not executing and tasks by my worker nodes.

Currently I am submitting this to Livy, which submits to our Spark Cluster with 8 cores and 12GB of RAM with the following configuration:

data={
    'file': bar_jar.format(bucket_name),
    'className': 'com.bar.me',
    'jars': [
        common_jar.format(bucket_name),
    ],
    'args': [
        bucket_name,
        spark_master,
        data_folder
    ],
    'name': 'Foo',
    'driverMemory': '2g',
    'executorMemory': '9g',
    'driverCores': 1,
    'executorCores': 1,
    'conf': {
        'spark.driver.memoryOverhead': '200',
        'spark.executor.memoryOverhead': '200',
        'spark.submit.deployMode': 'cluster'
    }
}

The node logs then are endlessly filled with:

2019-03-29T22:24:32.119+0000: [GC (Allocation Failure) 2019-03-29T22:24:32.119+0000:
[ParNew: 68873K->20K(77440K), 0.0012329 secs] 257311K->188458K(349944K), 
0.0012892 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]

The issue is that the next stages & tasks are not executing, so the behavior is quite unexpected. Tasks won't run

like image 499
Eric Meadows Avatar asked Mar 29 '19 22:03

Eric Meadows


People also ask

What causes GC allocation failure?

Allocation Failure happens when there isn't enough free space to create new objects in Young generation. Allocation failures triggers minor GC (to do a minor collection) to free up space in the heap for the allocation request. With minor GC, space is freed for the new allocation to to be made in the young generation.

What is GC allocation failure in Databricks?

A GC allocation failure means that the garbage collector could not move objects from young gen to old gen fast enough because it does not have enough memory in old gen. This can cause application slowness.

How do I reduce the GC time on my Spark?

To avoid full GC in G1 GC, there are two commonly-used approaches: Decrease the InitiatingHeapOccupancyPercent value ( default is 45), to let G1 GC starts initial concurrent marking at an earlier time, so that we have higher chances to avoid full GC.

How do you allocate driver memory and executor memory in Spark?

Determine the memory resources available for the Spark application. Multiply the cluster RAM size by the YARN utilization percentage. Provides 5 GB RAM for available drivers and 50 GB RAM available for worker nodes. Discount 1 core per worker node to determine the executor core instances.


2 Answers

It is apparently a normal GC event:

This ‘Allocation failure’ log is not an error but is a totally normal case in JVM. This is a typical GC event which causes the Java Garbage Collection process to get triggered. Garbage Collection removes dead objects, compact reclaimed memory and thus helps in freeing up memory for new object allocations.

Source: https://medium.com/@technospace/gc-allocation-failures-42c68e8e5e04

Edit: If the next stages are not executing, maybe you should check stderr instead of stdout.

like image 53
Joy Yeh Avatar answered Oct 19 '22 03:10

Joy Yeh


The following link provides a description on how to allocate executor memory

https://aws.amazon.com/blogs/big-data/best-practices-for-successfully-managing-memory-for-apache-spark-applications-on-amazon-emr/

I found it very useful , but found that the following parameters

  1. spark.default.parallelism
  2. spark.sql.shuffle.partitions

needs to be updated as per our application requirements

like image 44
Sabarish Sathasivan Avatar answered Oct 19 '22 03:10

Sabarish Sathasivan