I have a CustomReceiver which receives a single event(String).The received single event is used during spark application's run time to read data from nosql and to apply transformations.When the processing time for each batch was observed to be greater than batch interval I set this property.
spark.streaming.backpressure.enabled=true
After which I expected the CustomReceiver to not trigger and receive the event when a batch is processing longer than batch window, which didn't happen and still a backlog of batches were being added. Am I missing something here?
Spark Streaming receives live input data streams and divides the data into batches, which are then processed by the Spark engine to generate the final stream of results in batches. Spark Streaming provides a high-level abstraction called discretized stream or DStream, which represents a continuous stream of data.
Discretized Stream or DStream is the basic abstraction provided by Spark Streaming. It represents a continuous stream of data, either the input data stream received from source, or the processed data stream generated by transforming the input stream.
def saveAsObjectFiles(prefix: String, suffix: String = ""): Unit. Save each RDD in this DStream as a Sequence file of serialized objects. Save each RDD in this DStream as a Sequence file of serialized objects. The file name at each batch interval is generated based on prefix and suffix : "prefix-TIME_IN_MS.
Apache Spark streaming is a separate library in the Spark engine designed to process streaming or continuously flowing data. It utilizes the DStream API, powered by Spark RDDs (Resilient Data Sets), to divide the data into chunks before processing it.
Try to check this and this articles.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With