Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does back pressure property work in Spark Streaming?

I have a CustomReceiver which receives a single event(String).The received single event is used during spark application's run time to read data from nosql and to apply transformations.When the processing time for each batch was observed to be greater than batch interval I set this property.

spark.streaming.backpressure.enabled=true

After which I expected the CustomReceiver to not trigger and receive the event when a batch is processing longer than batch window, which didn't happen and still a backlog of batches were being added. Am I missing something here?

like image 965
darkknight444 Avatar asked Jan 25 '17 00:01

darkknight444


People also ask

How does Spark Streaming work internally?

Spark Streaming receives live input data streams and divides the data into batches, which are then processed by the Spark engine to generate the final stream of results in batches. Spark Streaming provides a high-level abstraction called discretized stream or DStream, which represents a continuous stream of data.

What is DStream in Apache spark How does it work?

Discretized Stream or DStream is the basic abstraction provided by Spark Streaming. It represents a continuous stream of data, either the input data stream received from source, or the processed data stream generated by transforming the input stream.

What is the use of saveAsObjectFiles () operation on Dstreams?

def saveAsObjectFiles(prefix: String, suffix: String = ""): Unit. Save each RDD in this DStream as a Sequence file of serialized objects. Save each RDD in this DStream as a Sequence file of serialized objects. The file name at each batch interval is generated based on prefix and suffix : "prefix-TIME_IN_MS.

What method does Spark use to perform Streaming operations?

Apache Spark streaming is a separate library in the Spark engine designed to process streaming or continuously flowing data. It utilizes the DStream API, powered by Spark RDDs (Resilient Data Sets), to divide the data into chunks before processing it.


1 Answers

Try to check this and this articles.

like image 155
Eugene Lopatkin Avatar answered Sep 29 '22 19:09

Eugene Lopatkin