I am learning Kafka and I want to know how to specify then partition when I consume messages from a topic.
I have found several pictures like this:
It means that a consumer can consume messages from several partitions but a partition can only be read by a single consumer (within a consumer group).
Also, I have read several examples for consumer and they look like this:
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("group.id", "consumer-tutorial");
props.put("key.deserializer", StringDeserializer.class.getName());
props.put("value.deserializer", StringDeserializer.class.getName());
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
and:
Subscribe:
consumer.subscribe(Arrays.asList(“foo”, “bar”));
Poll
try {
while (running) {
ConsumerRecords<String, String> records = consumer.poll(1000);
for (ConsumerRecord<String, String> record : records)
System.out.println(record.offset() + ": " + record.value());
}
} finally {
consumer.close();
}
How does this work? From which partition will I read messages?
A consumer can be assigned to consume multiple partitions. So the rule in Kafka is only one consumer in a consumer group can be assigned to consume messages from a partition in a topic and hence multiple Kafka consumers from a consumer group can not read the same message from a partition.
To use this mode, instead of subscribing to the topic using subscribe , you just call assign(Collection) with the full list of partitions that you want to consume. String topic = "foo"; TopicPartition partition0 = new TopicPartition(topic, 0); TopicPartition partition1 = new TopicPartition(topic, 1); consumer.
Conservatively, you can estimate that a single partition for a single Kafka topic runs at 10 MB/s. As an example, if your desired throughput is 5 TB per day. That figure comes out to about 58 MB/s. Using the estimate of 10 MB/s per partition, this example implementation would require 6 partitions.
The consumers in a group divide the topic partitions as fairly amongst themselves as possible by establishing that each partition is only consumed by a single consumer from the group. When the number of consumers is lower than partitions, same consumers are going to read messages from more than one partition.
There are two ways to tell what topic/partitions you want to consume: KafkaConsumer#assign() (you specify the partition you want and the offset where you begin) and subscribe
(you join a consumer group, and partition/offset will be dynamically assigned by group coordinator depending of consumers in the same consumer group, and may change during runtime)
In both case, you need to poll
to receive data.
See https://kafka.apache.org/0110/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html, especially paragraphs Consumer Groups and Topic Subscriptions
and Manual Partition Assignment
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With