Definition in docs:
org.apache.spark.streaming.kafka
Class KafkaUtils
static JavaPairReceiverInputDStream<String,String> createStream(JavaStreamingContext jssc, String zkQuorum, String groupId, java.util.Map<String,Integer> topics)
Create an input stream that pulls messages from Kafka Brokers.
Why is topics a Map (rather than a string array)?
I understand that the string key is the topic name. But what about the integer value? What should I fill in?
Read the Javadoc:
public static JavaPairReceiverInputDStream createStream(JavaStreamingContext jssc, String zkQuorum, String groupId, java.util.Map topics)
Create an input stream that pulls messages from Kafka Brokers. Storage level of the data will be the default StorageLevel.MEMORY_AND_DISK_SER_2.
Parameters: jssc - JavaStreamingContext object
zkQuorum - Zookeeper quorum (hostname:port,hostname:port,..)
groupId - The group id for this consumer
topics - Map of (topic_name -> numPartitions) to consume. Each partition is consumed in its own thread
Returns: DStream of (Kafka message key, Kafka message value)
The value of the Map
is the number of partitions of the given topic name, which determines the number of threads that will be used to consume the topic.
If you see the documentation of the createStream
method of KafkaUtils
here, you'd see
topics - Map of (topic_name -> numPartitions) to consume. Each partition is consumed in its own thread
The Integer value is the number of partitions for the topic as part of the key in the map.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With