Apache spark in memory caching

Tags:

Spark caches the working dataset into memory and then performs computations at memory speeds. Is there a way to control how long the working set resides in RAM?

I have a huge amount of data that is accessed through the job. It takes time to load the job initially to RAM and when the next job arrives, it has to load all the data again to RAM which is time consuming. Is there a way to cache the data forever(or for specified time) into RAM using Spark?

332

asked Nov 11 '14 05:11

Atom

1 Answers

To uncache explicitly, you can use RDD.unpersist()

If you want to share cached RDDs across multiple jobs you can try the following:

Cache the RDD using a same context and re-use the context for other jobs. This way you only cache once and use it many times
There are 'spark job servers' that exist to do the above mentioned functionality. Checkout Spark Job Server open sourced by Ooyala.
Use an external caching solution like Tachyon

I have been experimenting with caching options in Spark. You can read more here : http://sujee.net/understanding-spark-caching/

175

answered Oct 22 '22 11:10

Sujee Maniyam

Related questions
                            
                                Named entity graph JOINS results (need distinct option) in Spring Data JPA
                            
                                Algorithm to detect redundant rules
                            
                                Gradle + Intellij IDEA 13 bug
                            
                                How to get variable name? - java [duplicate]
                            
                                volatile + immutable holder object = thread safe?
                            
                                How to configure logback to append special prefix for each object?
                            
                                Iterating and retrieving metadata of all objects in Amazon S3
                            
                                Why can't I extend Clojure's IFn using extend-type?
                            
                                Secure plain text passwords in configuration
                            
                                Overriding beans in Java-based spring configuration hierarchy
                            
                                How to suppress FindBugs warning (hard coded reference to an absolute pathname)? [duplicate]
                            
                                Canceling a CompletableFuture chain
                            
                                How does the Java compiler perform type erasure for lower bounded wildcards?
                            
                                Why does Java spawn so many processes?
                            
                                Multiple Types of Objects in Java containers
                            
                                Why does jOOQ suggest to put generated code under "/target" and not under "/src"?
                            
                                JavaFx Set Tableview Cell Background Color Dynamically
                            
                                Unexpected error using lambdas in Java 8
                            
                                Hashtable values displayed in Eclipse debugger
                            
                                How do I place jars in jetty/lib on the jetty classpath?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Apache spark in memory caching

Tags:

java

caching

apache-spark

Atom

People also ask

1 Answers

Sujee Maniyam

Recent Activity

Donate For Us