Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

NoClassDefFoundError: SparkSession - even though build is working

Tags:

I copied https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/ml/RandomForestClassifierExample.scala into a new project and setup a build.sbt

name := "newproject"
version := "1.0"
scalaVersion := "2.11.8"

javacOptions ++= Seq("-source", "1.8", "-target", "1.8")
scalacOptions += "-deprecation"

libraryDependencies ++= Seq(
  "org.apache.spark" % "spark-core_2.11"  % "2.0.0" % "provided",
  "org.apache.spark" % "spark-sql_2.11"   % "2.0.0" % "provided",
  "org.apache.spark" % "spark-mllib_2.11" % "2.0.0" % "provided",
  "org.jpmml" % "jpmml-sparkml" % "1.1.1",
  "org.apache.maven.plugins" % "maven-shade-plugin" % "2.4.3",
  "org.scalatest" %% "scalatest" % "3.0.0"
)

I am able to build it from IntelliJ 2016.2.5, but I when I get the error

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/sql/SparkSession$
    at org.apache.spark.examples.ml.RandomForestClassifierExample$.main(RandomForestClassifierExample.scala:32)
    at org.apache.spark.examples.ml.RandomForestClassifierExample.main(RandomForestClassifierExample.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.SparkSession$
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    ... 7 more

I am even able to click on SparkSession and get to the source code. What is the problem?

like image 330
Make42 Avatar asked Nov 02 '16 15:11

Make42


2 Answers

When you say provided for your dependency, the build will compile against that dependency, but it will not be added to the classpath at runtime (it is assumed to be already there).

That is the correct setting when building Spark jobs for spark-submit (because they will run inside of a Spark container that does provide the dependency, and including it a second time would cause trouble).

However, when you run locally, you need that dependency present. So either change the build to not have this provided (but then you need to adjust it when building to submit the job), or configure your runtime classpath in the IDE to already have that jar file.

like image 113
Thilo Avatar answered Sep 19 '22 09:09

Thilo


In my case, I was using my local Cloudera CDH 5.9.0 cluster with Spark 1.6.1 installed by default and Spark 2.0.0 installed as a parcel. Thus, spark-submit was using Spark 1.6.1 while spark2-submit was Spark 2.0.0. Since SparkSession did not exist in 1.6.1, the error was thrown. Using the correct spark2-submit command resolved the problem.

like image 25
Garren S Avatar answered Sep 19 '22 09:09

Garren S