Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is spark.streaming.receiver.maxRate? How does it work with batch interval

I am working with spark 1.5.2. I understand what a batch interval is, essentially the interval after which the processing part should start on the data received from the receiver. But I do not understand what is spark.streaming.receiver.maxRate. From some research it is apparently an important parameter.

Lets consider a scenario. my batch interval is set to 60s. And spark.streaming.receiver.maxRate is set to 60*1000. What if I get 60*2000 records in 60s due to some temporary load. What would happen? Will the additional 60*1000 records be dropped? Or would the processing happen twice during that batch interval?

like image 406
nish Avatar asked Dec 02 '15 13:12

nish


1 Answers

Property spark.streaming.receiver.maxRate applies to number of records per second.

The receiver max rate is applied when receiving data from the stream - that means even before batch interval applies. In other words you will never get more records per second than set in spark.streaming.receiver.maxRate. The additional records will just "stay" in the stream (e.g. Kafka, network buffer, ...) and get processed in the next batch.

like image 74
vanekjar Avatar answered Mar 03 '23 12:03

vanekjar