Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Classpath resolution between spark uber jar and spark-submit --jars when similar classes exist in both

Tags:

apache-spark

What is the precedence in class loading when both the uber jar of my spark application and the contents of --jars option to my spark-submit shell command contain similar dependencies ?

I ask this from a third-party library integration standpoint. If I set --jars to use a third-party library at version 2.0 and the uber jar coming into this spark-submit script was assembled using version 2.1, which class is loaded at runtime ?

At present, I think of keeping my dependencies on hdfs, and add them to the --jars option on spark-submit, while hoping via some end-user documentation to ask users to set the scope of this third-party library to be 'provided' in their spark application's maven pom file.

like image 698
Sudarshan Thitte Avatar asked Jul 01 '15 00:07

Sudarshan Thitte


People also ask

How to specify multiple jars in Spark submit?

You can also add jars using Spark submit option --jar , using this option you can add a single jar or multiple jars by comma-separated.

How do we submit jar files in spark?

Only one is set through Spark submit and one via code. Choose the one which suits you better. One important thing to note is that using either of these options does not add the JAR file to your driver/executor classpath. You'll need to explicitly add them using the extraClassPath configuration on both.

How do you add an external jar in spark submit from a local maven repository?

Use --jars option To add JARs to a Spark job, --jars option can be used to include JARs on Spark driver and executor classpaths. If multiple JAR files need to be included, use comma to separate them. The following is an example: spark-submit --jars /path/to/jar/file1,/path/to/jar/file2 ...

What is spark jars?

Spark JAR files let you package a project into a single file so it can be run on a Spark cluster. A lot of developers develop Spark code in brower based notebooks because they're unfamiliar with JAR files.


1 Answers

This is somewhat controlled with params:

  • spark.driver.userClassPathFirst &
  • spark.executor.userClassPathFirst

If set to true (default is false), from docs:

(Experimental) Whether to give user-added jars precedence over Spark's own jars when loading classes in the the driver. This feature can be used to mitigate conflicts between Spark's dependencies and user dependencies. It is currently an experimental feature. This is used in cluster mode only.

I wrote some of the code that controls this, and there were a few bugs in the early releases, but if you're using a recent Spark release it should work (although it is still an experimental feature).

like image 178
Holden Avatar answered Oct 22 '22 01:10

Holden