I have written a Spark Job in Java. When I submit the Job it gives below error:
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/sql/SparkSession
at com.thinkbiganalytics.veon.util.SparkSessionBuilder.getOrCreateSparkSession(SparkSessionBuilder.java:12)
at com.thinkbiganalytics.veon.AbstractSparkTransformation.initSparkSession(AbstractSparkTransformation.java:92)
at com.thinkbiganalytics.veon.transformations.SDPServiceFeeDeductionSourceToEventStore.init(SDPServiceFeeDeductionSourceToEventStore.java:57)
at com.thinkbiganalytics.veon.AbstractSparkTransformation.doTransform(AbstractSparkTransformation.java:51)
at com.thinkbiganalytics.veon.transformations.SDPServiceFeeDeductionSourceToEventStore.main(SDPServiceFeeDeductionSourceToEventStore.java:51)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:745)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.SparkSession
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
If you're running from inside Intellij IDEA, and you've marked your spark library as "provided", like so: "org.apache.spark" %% "spark-sql" % "3.0.1" % "provided"
, Then you need edit your Run/Debug configuration and check the "Include dependencies with Provided scope" box.
I was facing this issue while running from the Intellij editor. I had marked the spark jars as provided in pom.xml
, see below:
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>2.4.0</version>
<scope>provided</scope>
</dependency>
On removing the provided scope, the error was gone.
On making provided spark jars they would be provided only on running the application with spark-submit
or having the spark jars on the classpath
when submitting with spark-submit
, check that your project has the same dependency as spark version in pom.xml,
This may be because you have two spark versions on the same machine
If you want to have different Spark installations on your machine, you can create different soft links and can use the exact spark version on which you have build your project
spark1-submit -> /Users/test/sparks/spark-1.6.2-bin-hadoop2.6/bin/spark-submit
spark2–submit -> /Users/test/sparks/spark-2.1.1-bin-hadoop2.7/bin/spark-submit
Here is a link from Cloudera blog about multiple Spark versions https://community.cloudera.com/t5/Advanced-Analytics-Apache-Spark/Multiple-Spark-version-on-the-same-cluster/td-p/39880
Probably you are deploying your application on the cluster with lower Spark version.
Please check Spark version on your cluster - it should be the same as version in pom.xml. Please also note, that all Spark dependencies should be marked as provided
when you use spark-submit to deploy application
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With