Spark-Submit: --packages vs --jars

Tags:

Can someone explain the differences between --packages and --jars in a spark-submit script?

nohup ./bin/spark-submit   --jars ./xxx/extrajars/stanford-corenlp-3.8.0.jar,./xxx/extrajars/stanford-parser-3.8.0.jar \
--packages datastax:spark-cassandra-connector_2.11:2.0.7 \
--class xxx.mlserver.Application \
--conf spark.cassandra.connection.host=192.168.0.33 \
--conf spark.cores.max=4 \
--master spark://192.168.0.141:7077  ./xxx/xxxanalysis-mlserver-0.1.0.jar   1000  > ./logs/nohup.out &

Also, do I require the--packages configuration if the dependency is in my applications pom.xml? (I ask because I just blew up my applicationon by changing the version in --packages while forgetting to change it in the pom.xml)

I am using the --jars currently because the jars are massive (over 100GB) and thus slow down the shaded jar compilation. I admit I am not sure why I am using --packages other than because I am following datastax documentation

928

asked Jul 20 '18 03:07

Jake

1 Answers

if you do spark-submit --help it will show:

--jars JARS                 Comma-separated list of jars to include on the driver
                              and executor classpaths.

--packages                  Comma-separated list of maven coordinates of jars to include
                              on the driver and executor classpaths. Will search the local
                              maven repo, then maven central and any additional remote
                              repositories given by --repositories. The format for the
                              coordinates should be groupId:artifactId:version.

if it is --jars

then spark doesn't hit maven but it will search specified jar in the local file system it also supports following URL scheme hdfs/http/https/ftp.

so if it is --packages

then spark will search specific package in local maven repo then central maven repo or any repo provided by --repositories and then download it.

Now Coming back to your questions:

Also, do I require the--packages configuration if the dependency is in my applications pom.xml?

Ans: No, If you are not importing/using classes in jar directly but need to load classes by some class loader or service loader (e.g. JDBC Drivers). Yes otherwise.

BTW, If you are using specific version of specific jar in your pom.xml then why dont you make uber/fat jar of your application or provide dependency jar in --jars argument ? instead of using --packages

links to refer:

spark advanced-dependency-management

add-jars-to-a-spark-job-spark-submit

137

answered Oct 17 '22 08:10

nomadSK25

Related questions
                            
                                How to insert a document with date in mongo?
                            
                                ElasticSearch returning only documents with distinct value
                            
                                How to use prepared statement for select query in Java?
                            
                                Signature trust establishment failed for SAML metadata entry
                            
                                Selenium, how do you check scroll position
                            
                                Proper way to clear Realm table/database?
                            
                                Redis Key expire notification with Jedis
                            
                                Ordering an array with special characters like accents
                            
                                Jackson ObjectMapper upper/lower case issues
                            
                                Jersey rest client not adding query parameters
                            
                                How to run jetty server for java junit testing
                            
                                Why do java source files require package declarations?
                            
                                How can we have 2 parameters in java.util.function.Function lambda?
                            
                                AndroidStudio emulator "won't run unless you update Google Play Services"
                            
                                How to convert Map to List in Java 8
                            
                                How set Response body in javax.ws.rs.core.Response
                            
                                Why isn't there IntStream.flatMapToObj()?
                            
                                How to validate Iranian National Code (Melli Code or Code Melli) in android
                            
                                Take full page screenshot in Chrome with Selenium
                            
                                Java - Replace host in url?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Spark-Submit: --packages vs --jars

Tags:

java

scala

cassandra

apache-spark

Jake

People also ask

1 Answers

nomadSK25

Recent Activity

Donate For Us