Spark nodes keep printing GC (Allocation Failure) and no tasks run

Tags:

I am running a Spark job using Scala, but it gets stuck not executing and tasks by my worker nodes.

Currently I am submitting this to Livy, which submits to our Spark Cluster with 8 cores and 12GB of RAM with the following configuration:

data={
    'file': bar_jar.format(bucket_name),
    'className': 'com.bar.me',
    'jars': [
        common_jar.format(bucket_name),
    ],
    'args': [
        bucket_name,
        spark_master,
        data_folder
    ],
    'name': 'Foo',
    'driverMemory': '2g',
    'executorMemory': '9g',
    'driverCores': 1,
    'executorCores': 1,
    'conf': {
        'spark.driver.memoryOverhead': '200',
        'spark.executor.memoryOverhead': '200',
        'spark.submit.deployMode': 'cluster'
    }
}

The node logs then are endlessly filled with:

2019-03-29T22:24:32.119+0000: [GC (Allocation Failure) 2019-03-29T22:24:32.119+0000:
[ParNew: 68873K->20K(77440K), 0.0012329 secs] 257311K->188458K(349944K), 
0.0012892 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]

The issue is that the next stages & tasks are not executing, so the behavior is quite unexpected. Tasks won't run

499

asked Mar 29 '19 22:03

Eric Meadows

2 Answers

It is apparently a normal GC event:

This ‘Allocation failure’ log is not an error but is a totally normal case in JVM. This is a typical GC event which causes the Java Garbage Collection process to get triggered. Garbage Collection removes dead objects, compact reclaimed memory and thus helps in freeing up memory for new object allocations.

Source: https://medium.com/@technospace/gc-allocation-failures-42c68e8e5e04

Edit: If the next stages are not executing, maybe you should check stderr instead of stdout.

answered Oct 19 '22 03:10

Joy Yeh

The following link provides a description on how to allocate executor memory

https://aws.amazon.com/blogs/big-data/best-practices-for-successfully-managing-memory-for-apache-spark-applications-on-amazon-emr/

I found it very useful , but found that the following parameters

spark.default.parallelism
spark.sql.shuffle.partitions

needs to be updated as per our application requirements

answered Oct 19 '22 03:10

Sabarish Sathasivan

Related questions
                            
                                Intellij Scala class definition formatting
                            
                                Problems Importing Scala Libraries In Intellij
                            
                                NullPointerException on executing concurrent queries using Slick
                            
                                Failed to refresh gradle project in IntelliJ IDEA 2016.1: Unknown method ScalaCompileOptions.getForce()
                            
                                Creating a parquet file on AWS Lambda function
                            
                                how to redirect Scala Spark Dataset.show to log4j logger
                            
                                Is there a ruby equivalent to the Scala Option?
                            
                                How to shrink scala swing library using Proguard?
                            
                                Scala Streams Performance
                            
                                Scala 2.10 and package reflection
                            
                                Distinction between type aliases and type lambdas
                            
                                Function Overhead in Functional Languages like Haskell or in Hybrids like Scala [closed]
                            
                                Understanding GenericTraversableTemplate and other Scala collection internals
                            
                                Type alias parameter bounds not enforced in all cases
                            
                                Intellij IDEA: Run Scala REPL Console on a remote machine.
                            
                                sbt compilation causes stackoverflow error
                            
                                Using SparkR JVM to call methods from a Scala jar file
                            
                                Processing DynamoDB streams using the AWS Java DynamoDB streams Kinesis adapter
                            
                                how to find a implicit val or def used at in idea IDE?
                            
                                Get source code of any class from within a Java program

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Spark nodes keep printing GC (Allocation Failure) and no tasks run

Tags:

scala

apache-spark

hadoop

livy

Eric Meadows

People also ask

2 Answers

Joy Yeh

Sabarish Sathasivan

Recent Activity

Donate For Us