I have a couple of basic questions related to Spark Streaming
[Please let me know if these questions have been answered in other posts - I couldn't find any]:
(i) In Spark Streaming, is the number of partitions in an RDD by default equal to the number of workers?
(ii) In the Direct Approach for Spark-Kafka integration, the number of RDD partitions created is equal to the number of Kafka partitions.
Is it right to assume that each RDD partition i
would be mapped to the same worker node j
in every batch of the DStream
? ie, is the mapping of a partition to a worker node based solely on the index of the partition? For example, could partition 2 be assigned to worker 1 in one batch and worker 3 in another?
Thanks in advance
Spark receives real-time data and divides it into smaller batches for the execution engine. In contrast, Structured Streaming is built on the SparkSQL API for data stream processing. In the end, all the APIs are optimized using Spark catalyst optimizer and translated into RDDs for execution under the hood.
The offset is a simple integer number that is used by Kafka to maintain the current position of a consumer. That's it. The current offset is a pointer to the last record that Kafka has already sent to a consumer in the most recent poll. So, the consumer doesn't get the same record twice because of the current offset.
An important one is spark. streaming. kafka. maxRatePerPartition which is the maximum rate (in messages per second) at which each Kafka partition will be read by this direct API. Deploying: This is same as the first approach.
i) default parallelism is number of cores (or 8 for mesos), but the number of partitions is up to the input stream implementation
ii) no, the mapping of partition indexes to worker nodes is not deterministic. If you're running kafka on the same nodes as your spark executors, the preferred location to run the task will be on the node of the kafka leader for that partition. But even then, a task may be scheduled on another node.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With