Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How many RDDs does DStream generate for a batch interval?

Does one batch interval of data generate one and only one RDD in DStream regardless of how big is the quantity of the data?

like image 713
Guo Avatar asked Feb 02 '16 22:02

Guo


1 Answers

It's very late to reply to this thread. But still, It's worth adding a few more points. Number of RDDs depends upon how many receivers you have in your application. That's why "sparkContext.read" will have multiple RDDs. But if you have only one receiver or Kafka as a source (receiver-less) in that case you will get only one RDD.

like image 132
Mohammad Tameem Avatar answered Sep 26 '22 17:09

Mohammad Tameem