Why persist () are lazily evaluated in Spark

Tags:

apache-spark

I understood the point that in Scala there are 2 types of operations

Transformations
Actions

Transformations like map(), filter() are evaluated lazily. So, that optimization can be done on Action execution. For example if I execute action first() then Spark will optimize to read only first line.

But why persist() operation is evaluated lazily. Because either ways I go, eagerly or lazily, it is going to persist entire RDD as per Storage level.

Can you please detail me why persist() is transformation instead of action.

775

asked Dec 23 '15 15:12

dinesh028

1 Answers

For starters eager persistence would pollute a whole pipeline. cache or persist only expresses intention. It doesn't mean we'll ever get to the point when RDD is materialized and can be actually cached. Moreover there are contexts where data is cached automatically.

Because either ways I go, eagerly or lazily, it is going to persist entire RDD as per Storage level.

It is not exactly true. Thing is, persist is not persistent. As it is clearly stated in the documentation for MEMORY_ONLY persistence level:

If the RDD does not fit in memory, some partitions will not be cached and will be recomputed on the fly each time they're needed.

With MEMORY_AND_DISK remaining data is stored to the disk but still can be evicted if there is not enough memory for subsequent caching. What is even more important:

Spark automatically monitors cache usage on each node and drops out old data partitions in a least-recently-used (LRU) fashion.

You can also argue that cache / persist is semantically different from Spark actions which are executed for specific IO side-effects. cache is more a hint for a Spark engine that we may want to reuse this piece of code later.

138

answered Sep 21 '22 12:09

zero323

Related questions
                            
                                Why does Array[T forSome { type T; }] mean Array[Any]
                            
                                Scala type alias with type parameters
                            
                                Type Lambda's in Scala: why is the extra parentheses needed in a declaration?
                            
                                ScalaMock: How to mock/stub a method to return different values per call?
                            
                                How to make your own for-comprehension compliant scala monad?
                            
                                How is "become" implemented in languages that support the actor model?
                            
                                guide to move from filter to withFilter? [closed]
                            
                                Scala - What does ' => SomeType' means? [duplicate]
                            
                                Use of Scala's private final modifier?
                            
                                What is the easiest way to implement a Scala PartialFunction in Java?
                            
                                Introspect argument passed to a Scala macro
                            
                                Standard lib or Akka for Scala.2.10.1?
                            
                                Can multi-projects from GIT be used as SBT dependencies?
                            
                                Scala fast text file read and upload to memory
                            
                                How to execute shell command before compile task?
                            
                                Understanding Multiple Context Bounds
                            
                                Gatling-scala check 2 status codes (either or)
                            
                                Configuration depending on launch mode
                            
                                Does Scala has intermediate/terminal ops as Java8 has?
                            
                                Scala: String "+" vs "++"

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With