I am working on a Hadoop cluster that has Spark 2.3.x. For my use case, I need Spark 2.4x which I downloaded from internet and moved it to my server and extracted into a new dir: ~/john/spark247ext/spark-2.4.7-bin-hadoop2.7
This is how my Spark2.4.7 directory looks like:
username@:[~/john/spark247ext/spark-2.4.7-bin-hadoop2.7] {173} $ ls
bin conf data examples jars kubernetes LICENSE licenses NOTICE python R README.md RELEASE sbin yarn
These are the contents of my bin dir.
username@:[~/john/spark247ext/spark-2.4.7-bin-hadoop2.7/bin] {175} $ ls
beeline find-spark-home.cmd pyspark2.cmd spark-class sparkR2.cmd spark-shell.cmd spark-submit
beeline.cmd load-spark-env.cmd pyspark.cmd spark-class2.cmd sparkR.cmd spark-sql spark-submit2.cmd
docker-image-tool.sh load-spark-env.sh run-example spark-class.cmd spark-shell spark-sql2.cmd spark-submit.cmd
find-spark-home pyspark run-example.cmd sparkR spark-shell2.cmd spark-sql.cmd
I am submitting my spark code using the below spark spark submit command:
./spark-submit --master yarn --deploy-mode cluster --driver-class-path /home/john/jars/mssql-jdbc-9.2.0.jre8.jar --jars /home/john/jars/spark-bigquery-with-dependencies_2.11-0.19.1.jar,/home/john/jars/mssql-jdbc-9.2.0.jre8.jar --driver-memory 1g --executor-memory 4g --executor-cores 4 --num-executors 4 --class com.loader /home/john/jars/HiveLoader-1.0-SNAPSHOT-jar-with-dependencies.jar somearg1 somearg2 somearg3
The job fails with exception java.lang.ClassNotFoundException:com.sun.jersey.api.client.config.ClientConfig
so I added that jar to my spark-submit command as well like below.
./spark-submit --master yarn --deploy-mode cluster --driver-class-path /home/john/jars/mssql-jdbc-9.2.0.jre8.jar --jars /home/john/jars/spark-bigquery-with-dependencies_2.11-0.19.1.jar,/home/john/jars/mssql-jdbc-9.2.0.jre8.jar,/home/john/jars/jersey-client-1.19.4.jar --driver-memory 1g --executor-memory 4g --executor-cores 4 --num-executors 4 --class com.loader /home/john/jars/HiveLoader-1.0-SNAPSHOT-jar-with-dependencies.jar somearg1 somearg2 somearg3
I also checked the the directory: /john/spark247ext/spark-2.4.7-bin-hadoop2.7/jars and found out that the jar: jersey-client-x.xx.x.jar exists there.
username@:[~/john/spark247ext/spark-2.4.7-bin-hadoop2.7/jars] {179} $ ls -ltr | grep jersey
-rwxrwxrwx 1 john john 951701 Sep 8 2020 jersey-server-2.22.2.jar
-rwxrwxrwx 1 john john 72733 Sep 8 2020 jersey-media-jaxb-2.22.2.jar
-rwxrwxrwx 1 john john 971310 Sep 8 2020 jersey-guava-2.22.2.jar
-rwxrwxrwx 1 john john 66270 Sep 8 2020 jersey-container-servlet-core-2.22.2.jar
-rwxrwxrwx 1 john john 18098 Sep 8 2020 jersey-container-servlet-2.22.2.jar
-rwxrwxrwx 1 john john 698375 Sep 8 2020 jersey-common-2.22.2.jar
-rwxrwxrwx 1 john john 167421 Sep 8 2020 jersey-client-2.22.2.jar
I also added the dependency in my pom.xml file:
<dependency>
<groupId>com.sun.jersey</groupId>
<artifactId>jersey-client</artifactId>
<version>1.19.4</version>
</dependency>
Even after giving the jar file in my spark-submit command and also creting a fat jar file out of my maven project which will have all dependencies, I still see the exception:
Exception in thread "main" java.lang.NoClassDefFoundError: com/sun/jersey/api/client/config/ClientConfig
at org.apache.hadoop.yarn.client.api.TimelineClient.createTimelineClient(TimelineClient.java:55)
at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.createTimelineClient(YarnClientImpl.java:181)
at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.serviceInit(YarnClientImpl.java:168)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:161)
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1135)
at org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1530)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: com.sun.jersey.api.client.config.ClientConfig
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
The spark I downloaded is for my own use case so I haven't changed any settings of the existing spark version in the project which is Spark 2.3
Could anyone let me know what do I do to fix the issue so that the code runs properly ?
Can you use the property in your spark-submit
--conf "spark.driver.userClassPathFirst=true"
I think you are getting a jar conflict where the different version of the same jar is being picked up from the environment
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With