I just want to ask on the specifics how to successfully use checkpointInterval in Spark. And what do you mean by this comment in the code for ALS: https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala
If the checkpoint directory is not set in [[org.apache.spark.SparkContext]], * this setting is ignored.
Edit:
There are two types of data we checkpoint in Spark : Metadata Checkpointing : – Metadata means data about the data. Metadata checkpointing is used to recover the streaming application driver node from failure. It includes configurations used to create the application, DStream operations and incomplete batches.
RDD Checkpointing is a process of truncating RDD lineage graph and saving it to a reliable distributed (HDFS) or local file system. There are two types of checkpointing: < > - RDD checkpointing that saves the actual intermediate RDD data to a reliable distributed file system (e.g. Hadoop DFS)
• A local check point is a snapshot of the state of the process at a. given instance. • Assumption. – A process stores all local checkpoints on the stable storage. – A process is able to roll back to any of its existing local checkpoints.
Checkpointing can be used to truncate the logical plan of this DataFrame , which is especially useful in iterative algorithms where the plan may grow exponentially. It will be saved to files inside the checkpoint directory set with SparkContext. setCheckpointDir() . New in version 2.1.
How can we set checkPoint directory? Can we use any hdfs-compatible directory for this?
You can use SparkContext.setCheckpointDir
. As far as I remember in local mode both local and DFS paths work just fine, but on the cluster the directory must be a HDFS path.
Is using setCheckpointInterval the correct way to implement checkpointing in ALS to avoid Stack Overflow errors?
It should help. See SPARK-1006
PS: It seems that in order to actually perform check-point in ALS, the checkpointDir
must be set or check-pointing won't be effective [Ref. here.]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With