Upload zip file using --archives option of spark-submit on yarn

Tags:

I have a directory with some model files and my application has to access these models files in local file system due to some reason.

Of course I know that --files option of spark-submit can upload file to the working directory of each executor and it does work.

However, I want keep the directory structure of my files so I come up with --archives option, which is said

YARN-only:
......
--archives ARCHIVES         Comma separated list of archives to be extracted into the working directory of each executor.
......

But when I actually use it to upload models.zip, I found yarn just put it there without extraction, like what it did with --files. Have I misunderstood to be extracted or misused this option?

697

asked Jan 06 '17 03:01

Mo Tao

1 Answers

Found the answer myself.

YARN does extract the archive but add an extra folder with the same name of the archive. To make it clear, If I put models/model1 and models/models2 in models.zip, then I have to access my models by models.zip/models/model1 and models.zip/models/model2.

Moreover, we can make this more beautiful using the # syntax.

The --files and --archives options support specifying file names with the # similar to Hadoop. For example you can specify: --files localtest.txt#appSees.txt and this will upload the file you have locally named localtest.txt into HDFS but this will be linked to by the name appSees.txt, and your application should use the name as appSees.txt to reference it when running on YARN.

Edit:

This answer was tested on spark 2.0.0 and I'm not sure the behavior in other versions.

114

answered Sep 22 '22 11:09

Mo Tao

Related questions
                            
                                Extractor for a shapeless HList that mimics parser concatenation `~`
                            
                                Scala: return reference to a function
                            
                                Making sense of forall and exists output on empty list
                            
                                How do I convert an option tuple to a tuple of options in Scala?
                            
                                Play 2.3 implicit json conversion causes null pointer exception
                            
                                Aggregate function in spark-sql not found
                            
                                Passing parameters to a trait
                            
                                How to correctly generate SHA-256 checksum for a string in scala?
                            
                                Is there anything like Haskell's 'maybe' function built into Scala?
                            
                                How to get all request parameters in Play and Scala
                            
                                Sending the email to the following server failed : smtp.gmail.com:25
                            
                                NullPointerException in Scala Spark, appears to be caused be collection type?
                            
                                Does Scala's pattern matching violate the Open/Closed Principle?
                            
                                How do I sort a collection of Lists in lexicographic order in Scala?
                            
                                What are the differences between Akka and Netty besides their choice of language (Scala vs Java)? [closed]
                            
                                Why are integration tests in a Play/Scala project not executed when using "sbt it:test"?
                            
                                Intellij: "Error running Scala Console: Cannot Start Process"
                            
                                How to count number of columns in Spark Dataframe?
                            
                                Convert Seq to ArrayBuffer
                            
                                How do I create multidimensional Vectors in Scala?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Upload zip file using --archives option of spark-submit on yarn

Tags:

zip

scala

apache-spark

hadoop-yarn

Mo Tao

People also ask

1 Answers

Mo Tao

Recent Activity

Donate For Us