Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

java.lang.NoClassDefFoundError: org/apache/spark/sql/SparkSession

I have written a Spark Job in Java. When I submit the Job it gives below error:

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/sql/SparkSession
        at com.thinkbiganalytics.veon.util.SparkSessionBuilder.getOrCreateSparkSession(SparkSessionBuilder.java:12)
        at com.thinkbiganalytics.veon.AbstractSparkTransformation.initSparkSession(AbstractSparkTransformation.java:92)
        at com.thinkbiganalytics.veon.transformations.SDPServiceFeeDeductionSourceToEventStore.init(SDPServiceFeeDeductionSourceToEventStore.java:57)
        at com.thinkbiganalytics.veon.AbstractSparkTransformation.doTransform(AbstractSparkTransformation.java:51)
        at com.thinkbiganalytics.veon.transformations.SDPServiceFeeDeductionSourceToEventStore.main(SDPServiceFeeDeductionSourceToEventStore.java:51)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:745)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.SparkSession
        at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
like image 440
us56 Avatar asked Jul 31 '17 09:07

us56


4 Answers

If you're running from inside Intellij IDEA, and you've marked your spark library as "provided", like so: "org.apache.spark" %% "spark-sql" % "3.0.1" % "provided", Then you need edit your Run/Debug configuration and check the "Include dependencies with Provided scope" box.

like image 181
Jeremy Avatar answered Nov 13 '22 10:11

Jeremy


I was facing this issue while running from the Intellij editor. I had marked the spark jars as provided in pom.xml, see below:

<dependency>
     <groupId>org.apache.spark</groupId>
     <artifactId>spark-sql_2.11</artifactId>
     <version>2.4.0</version>
     <scope>provided</scope>
 </dependency>

On removing the provided scope, the error was gone.

On making provided spark jars they would be provided only on running the application with spark-submit or having the spark jars on the classpath

like image 32
userab Avatar answered Nov 13 '22 10:11

userab


when submitting with spark-submit , check that your project has the same dependency as spark version in pom.xml,

This may be because you have two spark versions on the same machine


If you want to have different Spark installations on your machine, you can create different soft links and can use the exact spark version on which you have build your project

spark1-submit -> /Users/test/sparks/spark-1.6.2-bin-hadoop2.6/bin/spark-submit

spark2–submit -> /Users/test/sparks/spark-2.1.1-bin-hadoop2.7/bin/spark-submit

Here is a link from Cloudera blog about multiple Spark versions https://community.cloudera.com/t5/Advanced-Analytics-Apache-Spark/Multiple-Spark-version-on-the-same-cluster/td-p/39880

like image 32
ankursingh1000 Avatar answered Nov 13 '22 10:11

ankursingh1000


Probably you are deploying your application on the cluster with lower Spark version.

Please check Spark version on your cluster - it should be the same as version in pom.xml. Please also note, that all Spark dependencies should be marked as provided when you use spark-submit to deploy application

like image 34
T. Gawęda Avatar answered Nov 13 '22 10:11

T. Gawęda