NoClassDefFoundError: Could not initialize XXX class after deploying on spark standalone cluster

Tags:

I wrote a spark streaming application built with sbt. It works perfectly fine locally, but after deploying on the cluster, it complains about a class I wrote which clearly in the fat jar (checked using jar tvf). The following is my project structure. XXX object is the one that spark complains about

src
`-- main
    `-- scala
        |-- packageName
        |   `-- XXX object
        `-- mainMethodEntryObject

My submit command:

$SPARK_HOME/bin/spark-submit \
  --class mainMethodEntryObject \
  --master REST_URL\
  --deploy-mode cluster \
  hdfs:///FAT_JAR_PRODUCED_BY_SBT_ASSEMBLY

Specific error message:

java.lang.NoClassDefFoundError: Could not initialize class XXX

430

asked Apr 26 '17 03:04

Dr.Pro

2 Answers

I ran into this issue for a reason similar to this user: http://apache-spark-developers-list.1001551.n3.nabble.com/java-lang-NoClassDefFoundError-is-this-a-bug-td18972.html

I was calling a method on an object that had a few variables defined on the object itself, including spark and a logger, like this

val spark = SparkSession
  .builder()
  .getOrCreate()

val logger = LoggerFactory.getLogger(this.getClass.getName)

The function I was calling called another function on the object, which called another function, which called yet another function on the object inside of a flatMap call on an rdd.

I was getting the NoClassDefFoundError error in a stacktrace where the previous 2 function calls in the stack trace were functions on the class Spark was telling me did not exist.

Based on the conversation linked above, my hypothesis was that the global spark reference wasn't getting initialized by the time the function that used it was getting called (the one that resulted in the NoClassDefFoundError exception).

After quite a few experiments, I found that this pattern worked to resolve the problem.

// Move global definitions here
object MyClassGlobalDef {

  val spark = SparkSession
    .builder()
    .getOrCreate()

  val logger = LoggerFactory.getLogger(this.getClass.getName)

}

// Force the globals object to be initialized
import MyClassGlobalDef._

object MyClass {
  // Functions here
}

It's kind of ugly, but Spark seems to like it.

answered Oct 08 '22 09:10

turtlemonvh

It's difficult to say without the code but it looks like a problem of serialization of your XXX object. I can't say I'm understand perfectly why, but the point is that the object is not shipped to the executor.

The solution that worked for me is to convert your object to a class that extends Serializable and just instantiate it where you need it. So basically, if I'm not wrong you have

object test {
   def foo = ...
}

which would be used as test.foo in your main, but you need at minimum

class Test extends Serializable {
   def foo = ...
}

and then in your main have val test = new Test at the beginning and that's it.

answered Oct 08 '22 10:10

Wilmerton

Related questions
                            
                                Scala Array Slicing with Tuple
                            
                                Returning db connection to HikariCP pool with Slick 3.1.x
                            
                                Spark explode nested JSON with Array in Scala
                            
                                How to compile a parametrized update in Slick 3.1?
                            
                                Scala converters convert Java collections to Wrapper objects
                            
                                What type should I declare a DateTime object in a scala class constructor?
                            
                                What does Node[TypeOne <: Node[TypeOne]] in scala mean?
                            
                                How to infer inner type of Shapeless record value with unary type constructor?
                            
                                SparkContext class not found error
                            
                                HBase: How to specify multiple prefix filters in a single scan operation
                            
                                Streaming file from server to client using Akka
                            
                                What is the difference between Scaldi and Guice
                            
                                explanation on scala for comprehension with Option
                            
                                Apache Spark UDF that returns dynamic data types
                            
                                Convert try to option without losing error information in Scala
                            
                                Slick - Inserting a row into two tables linked with an auto-incrementing key?
                            
                                Resolving types in F-bounded polymorphism
                            
                                akka-http: send element to akka sink from http route
                            
                                Scala method to side effect on map and return it
                            
                                Do something when exactly one option is non-empty

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

NoClassDefFoundError: Could not initialize XXX class after deploying on spark standalone cluster

Tags:

scala

deployment

apache-spark

spark-submit

spark-streaming

Dr.Pro

People also ask

2 Answers

turtlemonvh

Wilmerton

Recent Activity

Donate For Us