I am starting to learn Kafka, during my readings, some questions came to my mind:
When a producer is producing a message - it will specify the topic it wants to send the message to, is that right? Does it care about partitions?
When a subscriber is running - does it specify its group id so that it can be part of a cluster of consumers of the same topic or several topics that this group of consumers is interested in?
Does each consumer group have a corresponding partition on the broker or does each consumer have one?
Are the partitions created by the broker, and therefore not a concern for the consumers?
Since this is a queue with an offset for each partition, is it the responsibility of the consumer to specify which messages it wants to read? Does it need to save its state?
What happens when a message is deleted from the queue? - For example, the retention was for 3 hours, then the time passes, how is the offset being handled on both sides?
Kafka Partitioning Partitioning takes the single topic log and breaks it into multiple logs, each of which can live on a separate node in the Kafka cluster. This way, the work of storing messages, writing new messages, and processing existing messages can be split among many nodes in the cluster.
But here are a few general rules: maximum 4000 partitions per broker (in total; distributed over many topics) maximum 200,000 partitions per Kafka cluster (in total; distributed over many topics) resulting in a maximum of 50 brokers per Kafka cluster.
Conservatively, you can estimate that a single partition for a single Kafka topic runs at 10 MB/s. As an example, if your desired throughput is 5 TB per day. That figure comes out to about 58 MB/s. Using the estimate of 10 MB/s per partition, this example implementation would require 6 partitions.
This post already has answers, but I am adding my view with a few pictures from Kafka Definitive Guide
Before answering the questions, let's look at an overview of producer components:
1. When a producer is producing a message - It will specify the topic it wants to send the message to, is that right? Does it care about partitions?
The producer will decide target partition to place any message, depending on:
2. When a subscriber is running - Does it specify its group id so that it can be part of a cluster of consumers of the same topic or several topics that this group of consumers is interested in?
You should always configure group.id unless you are using the simple assignment API and you don’t need to store offsets in Kafka. It will not be a part of any group. source
3. Does each consumer group have a corresponding partition on the broker or does each consumer have one?
In one consumer group, each partition will be processed by one consumer only. These are the possible scenarios
4. As the partitions created by the broker, therefore not a concern for the consumers?
Consumer should be aware of the number of partitions, as was discussed in question 3.
5. Since this is a queue with an offset for each partition, is it the responsibility of the consumer to specify which messages it wants to read? Does it need to save its state?
Kafka(to be specific Group Coordinator) takes care of the offset state by producing a message to an internal __consumer_offsets topic, this behavior can be configurable to manual as well by setting enable.auto.commit
to false
. In that case consumer.commitSync()
and consumer.commitAsync()
can be helpful for managing offset.
More about Group Coordinator:
6. What happens when a message is deleted from the queue? - For example, The retention was for 3 hours, then the time passes, how is the offset being handled on both sides?
If any consumer starts after the retention period, messages will be consumed as per auto.offset.reset
configuration which could be latest/earliest
. technically it's latest
(start processing new messages) because all the messages got expired by that time and retention is topic-level configuration.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With