I am looking for some clarification on the subject. In Kafka documentations I found the following:
Kafka only provides a total order over messages within a partition, not between different partitions in a topic. Per-partition ordering combined with the ability to partition data by key is sufficient for most applications. However, if you require a total order over messages this can be achieved with a topic that has only one partition, though this will mean only one consumer process per consumer group.
So here are my questions:
Does it mean if i want to have more than 1 consumer (from the same group) reading from one topic I need to have more than 1 partition?
Does it mean I need same amount of partitions as amount of consumers for the same group?
How many consumers can read from one partition?
Also have some questions regarding relationship between keys and partitions with regard to API. I only looked at .net APIs (especially one from MS) but looks like the mimic Java API. I see when using a producer to send a message to a topic there is a key parameter. But when consumer reads from a topic there is a partition number.
Thanks in advance.
Partitioning takes the single topic log and breaks it into multiple logs, each of which can live on a separate node in the Kafka cluster. This way, the work of storing messages, writing new messages, and processing existing messages can be split among many nodes in the cluster.
Kafka's topics are divided into several partitions. While the topic is a logical concept in Kafka, a partition is the smallest storage unit that holds a subset of records owned by a topic . Each partition is a single log file where records are written to it in an append-only fashion.
For most implementations you want to follow the rule of thumb of 10 partitions per topic, and 10,000 partitions per Kafka cluster. Going beyond that amount can require additional monitoring and optimization.
In Kafka, the data is store in the key-value combination or pair. On the partition level, the storage happens. The key value is nothing but a messaging system. On the same basis, Kafka is working.
Does it mean if i want to have more than 1 consumer (from the same group) reading from one topic I need to have more than 1 partition?
Let's see the following properties of kafka:
With these properties, kafka is smartly able to provide both ordering guarantees
and load balancing
over a pool of consumer processes.
To answer your question, yes, in the context of the same group, if you want to have N consumers
, you have to have at least N partitions
.
Does it mean I need same amount of partitions as amount of consumers for the same group?
I think this has been explained in the first answer.
How many consumers can read from one partition?
The number of consumers
that can read from one partition is always equal to the number of consumer groups
subscribing to that topic.
Relationship between keys and partitions with regard to API
First, we must understand that the producer
is responsible for choosing which record to assign to which partition within the topic.
Now, lets see how producer does so. First, lets see the class definition of ProducerRecord.java
:
public class ProducerRecord<K, V> { private final String topic; private final Integer partition; private final Headers headers; private final K key; private final V value; private final Long timestamp; }
Here, the field that we have to understand from the class is partition
.
From the ProducerRecord docs,
partition number
is specified, that partition
will be used when sending the record.key
is present a partition will be chosen using a hash of the key
.key
nor partition
is present a partition will be assigned in a round-robin fashion
.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With