I copied https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/ml/RandomForestClassifierExample.scala into a new project and setup a build.sbt
name := "newproject"
version := "1.0"
scalaVersion := "2.11.8"
javacOptions ++= Seq("-source", "1.8", "-target", "1.8")
scalacOptions += "-deprecation"
libraryDependencies ++= Seq(
"org.apache.spark" % "spark-core_2.11" % "2.0.0" % "provided",
"org.apache.spark" % "spark-sql_2.11" % "2.0.0" % "provided",
"org.apache.spark" % "spark-mllib_2.11" % "2.0.0" % "provided",
"org.jpmml" % "jpmml-sparkml" % "1.1.1",
"org.apache.maven.plugins" % "maven-shade-plugin" % "2.4.3",
"org.scalatest" %% "scalatest" % "3.0.0"
)
I am able to build it from IntelliJ 2016.2.5, but I when I get the error
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/sql/SparkSession$
at org.apache.spark.examples.ml.RandomForestClassifierExample$.main(RandomForestClassifierExample.scala:32)
at org.apache.spark.examples.ml.RandomForestClassifierExample.main(RandomForestClassifierExample.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.SparkSession$
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 7 more
I am even able to click on SparkSession and get to the source code. What is the problem?
When you say provided
for your dependency, the build will compile against that dependency, but it will not be added to the classpath at runtime (it is assumed to be already there).
That is the correct setting when building Spark jobs for spark-submit
(because they will run inside of a Spark container that does provide the dependency, and including it a second time would cause trouble).
However, when you run locally, you need that dependency present. So either change the build to not have this provided
(but then you need to adjust it when building to submit the job), or configure your runtime classpath in the IDE to already have that jar file.
In my case, I was using my local Cloudera CDH 5.9.0 cluster with Spark 1.6.1 installed by default and Spark 2.0.0 installed as a parcel. Thus, spark-submit
was using Spark 1.6.1 while spark2-submit
was Spark 2.0.0. Since SparkSession did not exist in 1.6.1, the error was thrown. Using the correct spark2-submit
command resolved the problem.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With