In spark streaming, the DStreams we receive is a batch of RDDs. So how does windowing helps further.
As per my understanding it also batches the RDDs.
Correct me if I am wrong (new to Spark Streaming).
Window functions are useful for processing tasks such as calculating a moving average, computing a cumulative statistic, or accessing the value of rows given the relative position of the current row.
For example if you set batch interval 5 seconds - Spark Streaming will collect data for 5 seconds and then kick out calculation on RDD with that data. window size - it is interval of time in seconds for how much historical data shall be contained in RDD before processing.
Sliding Window controls transmission of data packets between various computer networks. Spark Streaming library provides windowed computations where the transformations on RDDs are applied over a sliding window of data.
The number of records in one batch is determined by the batch interval. A window will keep the number of batches as fit within the size of a window, that's why the window size must be a multiple of the batch interval. Your operations will then run on multiple batches, and with each new batch the window will move forward, discarding older batches.
The point is that in streaming, data that belongs together doesn't necessarily arrive at the same time, especially at low batch intervals. With windows you are essentially looking back in time.
But note that your job still runs at the specified batch interval, so it will produce output at the same pace as before but look at more data at once. You will also look at the same data multiple times!
There a nice blog post by Michael Noll which explains this in more detail: http://www.michael-noll.com/blog/2013/01/18/implementing-real-time-trending-topics-in-storm/.
Update:
You can increase your batch interval, but then your job is processing slower as well, i.e. only creating output every 10 seconds instead of 2. You can also put a window on one part of the computation, whereas the batch interval affects everything. Check out reduceByKeyAndWindow
for example.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With