Equivalent of Distributed Cache in Spark? [duplicate]

Tags:

In Hadoop, you can use the distributed cache to copy read-only files on each node. What is the equivalent way of doing so in Spark? I know about broadcast variables, but that is only good for variables, not files.

493

asked Jun 25 '15 00:06

MetallicPriest

1 Answers

Take a look at SparkContext.addFile()

Add a file to be downloaded with this Spark job on every node. The path passed can be either a local file, a file in HDFS (or other Hadoop-supported filesystems), or an HTTP, HTTPS or FTP URI. To access the file in Spark jobs, use SparkFiles.get(fileName) to find its download location.

A directory can be given if the recursive option is set to true. Currently directories are only supported for Hadoop-supported filesystems.

answered Oct 01 '22 07:10

Piotr Rudnicki

Related questions
                            
                                Does MQ api support alias modify
                            
                                multiple issues with spring tool suite 3.6.3 release
                            
                                Replacement for GWT [closed]
                            
                                How to post generic objects to a Spring controller?
                            
                                What property of the bit pattern is it that causes collisions?
                            
                                Java generics: Bound mismatch: The type is not a valid substitute for the bounded parameter of the type
                            
                                String replace function not working in android
                            
                                jersey 2.0 :: for cdi injection, is beans.xml mandatory?
                            
                                Creating library jar with dependent jars with maven dependency plugin
                            
                                Compell a method with specific annotation to have specific parameters/signature
                            
                                Create custom Annotation as alias for Framework-Annotation?
                            
                                Spring security throws javax.servlet.ServletException: Could not resolve view with name 'j_spring_security_check'
                            
                                Concurrent job Execution in Spark
                            
                                Read the JSON file in the Jersey sources folder
                            
                                How to convert a MultiPartFile (image) to a data URI?
                            
                                Configure HikariCP + Hibernate + GuicePersist(JPA) at Runtime
                            
                                Could not find action or result
                            
                                How to remove divider between Toolbar and RelativeLayout
                            
                                Java / Scala Future driven by a callback
                            
                                Merging a managed entity on the @ManyToOne side

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Equivalent of Distributed Cache in Spark? [duplicate]

Tags:

java

scala

apache-spark

hadoop

MetallicPriest

People also ask

1 Answers

Piotr Rudnicki

Recent Activity

Donate For Us