Apache Spark Checkpoint Directory is not set

Question

While using apache-spark, I was trying to apply "reduceByKeyAndWindow()" transformation on some streaming data, and got the following error:

pyspark.sql.utils.IllegalArgumentException: requirement failed: The checkpoint directory has not been set. Please set it by StreamingContext.checkpoint().

Is it necessary to set a checkpoint directory ?

If yes, what is the easiest way to set up one ?

morfious902002 · Accepted Answer

Yes, it is necessary. Checkpointing must be enabled for applications with any of the following requirements:

Usage of stateful transformations - If either updateStateByKey or reduceByKeyAndWindow (with inverse function) is used in the application, then the checkpoint directory must be provided to allow for periodic RDD checkpointing.

Recovering from failures of the driver running the application - Metadata checkpoints are used to recover with progress information. You can setup checkpoint directory using sc.checkpoint(checkpointDirectoryLocation)

http://spark.apache.org/docs/latest/streaming-programming-guide.html#checkpointing

Apache Spark Checkpoint Directory is not set

Tags:

apache-spark

streaming

pyspark

Sachin

1 Answers

morfious902002

Recent Activity

Donate For Us

Apache Spark Checkpoint Directory is not set

Tags:

apache-spark

streaming

pyspark

Sachin

1 Answers

morfious902002

Related questions

Recent Activity

Donate For Us