Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is the "topics" argument of KafkaUtils.createStream() a Map rather then array?

Definition in docs:

org.apache.spark.streaming.kafka

Class KafkaUtils

static JavaPairReceiverInputDStream<String,String> createStream(JavaStreamingContext jssc, String zkQuorum, String groupId, java.util.Map<String,Integer> topics)

Create an input stream that pulls messages from Kafka Brokers.

Why is topics a Map (rather than a string array)?

I understand that the string key is the topic name. But what about the integer value? What should I fill in?

like image 683
Alon Avatar asked Mar 03 '23 21:03

Alon


2 Answers

Read the Javadoc:

public static JavaPairReceiverInputDStream createStream(JavaStreamingContext jssc, String zkQuorum, String groupId, java.util.Map topics)

Create an input stream that pulls messages from Kafka Brokers. Storage level of the data will be the default StorageLevel.MEMORY_AND_DISK_SER_2.

Parameters: jssc - JavaStreamingContext object

zkQuorum - Zookeeper quorum (hostname:port,hostname:port,..)

groupId - The group id for this consumer

topics - Map of (topic_name -> numPartitions) to consume. Each partition is consumed in its own thread

Returns: DStream of (Kafka message key, Kafka message value)

The value of the Map is the number of partitions of the given topic name, which determines the number of threads that will be used to consume the topic.

like image 170
Eran Avatar answered Mar 06 '23 10:03

Eran


If you see the documentation of the createStream method of KafkaUtils here, you'd see

topics - Map of (topic_name -> numPartitions) to consume. Each partition is consumed in its own thread

The Integer value is the number of partitions for the topic as part of the key in the map.

like image 39
Madhu Bhat Avatar answered Mar 06 '23 12:03

Madhu Bhat