Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark-submit ClassNotFound exception

Tags:

I'm having problems with a "ClassNotFound" Exception using this simple example:

import org.apache.spark.SparkContext import org.apache.spark.SparkContext._ import org.apache.spark.SparkConf  import java.net.URLClassLoader  import scala.util.Marshal  class ClassToRoundTrip(val id: Int) extends scala.Serializable { }  object RoundTripTester {    def test(id : Int) : ClassToRoundTrip = {      // Get the current classpath and output. Can we see simpleapp jar?     val cl = ClassLoader.getSystemClassLoader     val urls = cl.asInstanceOf[URLClassLoader].getURLs     urls.foreach(url => println("Executor classpath is:" + url.getFile))      // Simply instantiating an instance of object and using it works fine.     val testObj = new ClassToRoundTrip(id)     println("testObj.id: " + testObj.id)      val testObjBytes = Marshal.dump(testObj)     val testObjRoundTrip = Marshal.load[ClassToRoundTrip](testObjBytes)  // <<-- ClassNotFoundException here     testObjRoundTrip   } }  object SimpleApp {   def main(args: Array[String]) {      val conf = new SparkConf().setAppName("Simple Application")     val sc = new SparkContext(conf)      val cl = ClassLoader.getSystemClassLoader     val urls = cl.asInstanceOf[URLClassLoader].getURLs     urls.foreach(url => println("Driver classpath is: " + url.getFile))      val data = Array(1, 2, 3, 4, 5)     val distData = sc.parallelize(data)     distData.foreach(x=> RoundTripTester.test(x))   } } 

In local mode, submitting as per the docs generates a "ClassNotFound" exception on line 31, where the ClassToRoundTrip object is deserialized. Strangely, the earlier use on line 28 is okay:

spark-submit --class "SimpleApp" \              --master local[4] \              target/scala-2.10/simpleapp_2.10-1.0.jar 

However, if I add extra parameters for "driver-class-path", and "-jars", it works fine, on local.

spark-submit --class "SimpleApp" \              --master local[4] \              --driver-class-path /home/xxxxxxx/workspace/SimpleApp/target/scala-2.10/simpleapp_2.10-1.0.jar \              --jars /home/xxxxxxx/workspace/SimpleApp/target/scala-2.10/SimpleApp.jar \              target/scala-2.10/simpleapp_2.10-1.0.jar 

However, submitting to a local dev master, still generates the same issue:

spark-submit --class "SimpleApp" \              --master spark://localhost.localdomain:7077 \              --driver-class-path /home/xxxxxxx/workspace/SimpleApp/target/scala-2.10/simpleapp_2.10-1.0.jar \              --jars /home/xxxxxxx/workspace/SimpleApp/target/scala-2.10/simpleapp_2.10-1.0.jar \              target/scala-2.10/simpleapp_2.10-1.0.jar 

I can see from the output that the JAR file is being fetched by the executor.

Logs for one of the executor's are here:

stdout: http://pastebin.com/raw.php?i=DQvvGhKm

stderr: http://pastebin.com/raw.php?i=MPZZVa0Q

I'm using Spark 1.0.2. The ClassToRoundTrip is included in the JAR. I would rather not have to hardcode values in SPARK_CLASSPATH or SparkContext.addJar. Can anyone help?

like image 520
puppet Avatar asked Sep 05 '14 14:09

puppet


Video Answer


2 Answers

I had this same issue. If master is local then program runs fine for most people. If they set it to (also happened to me) "spark://myurl:7077" it doesn't work. Most people get error because an anonymous class was not found during execution. It is resolved by using SparkContext.addJars ("Path to jar").

Make sure you are doing the following things: -

  • SparkContext.addJars("Path to jar created from maven [hint: mvn package]").
  • I have used SparkConf.setMaster("spark://myurl:7077") in code and have supplied same as argument while submitting job to spark via command line.
  • When you specify class in command line, make sure your are writing it's complete name with URL. eg: "packageName.ClassName"
  • Final command should look like this bin/spark-submit --class "packageName.ClassName" --master spark://myurl:7077 pathToYourJar/target/yourJarFromMaven.jar

Note: this jar pathToYourJar/target/yourJarFromMaven.jar in last point is also set in code as in first point of this answer.

like image 129
busybug91 Avatar answered Dec 21 '22 18:12

busybug91


I also had same issue. I think --jars is not shipping the jars to executors. After I added this into SparkConf, it works fine.

 val conf = new SparkConf().setMaster("...").setJars(Seq("/a/b/x.jar", "/c/d/y.jar")) 

This web page for trouble shooting is useful too.

like image 34
Yifei Avatar answered Dec 21 '22 18:12

Yifei