Spark spark-submit --jars arguments wants comma list, how to declare a directory of jars?

Tags:

In Submitting Applications in the Spark docs, as of 1.6.0 and earlier, it's not clear how to specify the --jars argument, as it's apparently not a colon-separated classpath not a directory expansion.

The docs say "Path to a bundled jar including your application and all dependencies. The URL must be globally visible inside of your cluster, for instance, an hdfs:// path or a file:// path that is present on all nodes."

Question: What are all the options for submitting a classpath with --jars in the spark-submit script in $SPARK_HOME/bin? Anything undocumented that could be submitted as an improvement for docs?

I ask because when I was testing --jars today, we had to explicitly provide a path to each jar:

/usr/local/spark/bin/spark-submit --class jpsgcs.thold.PipeLinkageData ---jars=local:/usr/local/spark/jars/groovy-all-2.3.3.jar,local:/usr/local/spark/jars/guava-14.0.1.jar,local:/usr/local/spark/jars/jopt-simple-4.6.jar,local:/usr/local/spark/jars/jpsgcs-core-1.0.8-2.jar,local:/usr/local/spark/jars/jpsgcs-pipe-1.0.6-7.jar /usr/local/spark/jars/thold-0.0.1-1.jar

We are choosing to pre-populate the cluster with all the jars in /usr/local/spark/jars on each worker, it seemed that if no local:/ file:/ or hdfs: was supplied, then the default is file:/ and the driver makes the jars available on a webserver run by the driver. I chose local, as above.

And it seems that we do not need to put the main jar in the --jars argument, I have not tested yet if other classes in the final argument (application-jar arg per docs, i.e. /usr/local/spark/jars/thold-0.0.1-1.jar) are shipped to workers, or if I need to put the application-jar in the --jars path to get classes not named after --class to be seen.

(And granted with Spark standalone mode using --deploy-mode client, you also have to put a copy of the driver on each worker but you don't know up front which worker will run the driver)

879

asked Jan 12 '16 08:01

JimLohse

1 Answers

In this way it worked easily.. instead of specifying each jar with version separately..

#!/bin/sh
# build all other dependent jars in OTHER_JARS

JARS=`find ../lib -name '*.jar'`
OTHER_JARS=""
   for eachjarinlib in $JARS ; do    
if [ "$eachjarinlib" != "APPLICATIONJARTOBEADDEDSEPERATELY.JAR" ]; then
       OTHER_JARS=$eachjarinlib,$OTHER_JARS
fi
done
echo ---final list of jars are : $OTHER_JARS
echo $CLASSPATH

spark-submit --verbose --class <yourclass>
... OTHER OPTIONS
--jars $OTHER_JARS,APPLICATIONJARTOBEADDEDSEPERATELY.JAR

Using tr unix command also can help like the below example.

--jars $(echo /dir_of_jars/*.jar | tr ' ' ',')

132

answered Sep 28 '22 00:09

Ram Ghadiyaram

Related questions
                            
                                LWJGL 3 get cursor position
                            
                                Replacement of Deprecated beanRef() in Camel v2.16.0
                            
                                are local variables made inside a loop destroyed when the loop starts over?
                            
                                How to include subproject classes in jar?
                            
                                Converting String to Time without Date in Java
                            
                                Globally setting properties for JVM [duplicate]
                            
                                Memory leak in java web application
                            
                                Creating too many threads in Java
                            
                                How to replace &amp; to & in SQL?
                            
                                MongoDB Java API: Full-text search
                            
                                javax.crypto.Cipher working differently since Android 6 Marshmallow
                            
                                Java 8, TSL v1 and javax.net.ssl.SSLHandshakeException: Received fatal alert: handshake_failure
                            
                                How to create ECDSA keys for authentication purposes?
                            
                                Hibernate validator error Spring boot
                            
                                Does object creation of subclass create object of superclass, if yes Is it possible to access it in subclass?
                            
                                Java (Equals Method)
                            
                                which cacert is in use?
                            
                                What is the time complexity of lower()/higher() of TreeSet in Java?
                            
                                a design pattern or approach to handle access control of the fields of an object
                            
                                NullPointerException possibility in Integer.toString(arg) method in java

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Spark spark-submit --jars arguments wants comma list, how to declare a directory of jars?

Tags:

java

jar

scala

apache-spark

cluster-computing

JimLohse

People also ask

1 Answers

Using `tr` unix command also can help like the below example.

Ram Ghadiyaram

Recent Activity

Donate For Us

Spark spark-submit --jars arguments wants comma list, how to declare a directory of jars?

Tags:

java

jar

scala

apache-spark

cluster-computing

JimLohse

People also ask

1 Answers

Using tr unix command also can help like the below example.

Ram Ghadiyaram

Related questions

Recent Activity

Donate For Us

Using `tr` unix command also can help like the below example.