Topics, partitions and keys

Tags:

I am looking for some clarification on the subject. In Kafka documentations I found the following:

Kafka only provides a total order over messages within a partition, not between different partitions in a topic. Per-partition ordering combined with the ability to partition data by key is sufficient for most applications. However, if you require a total order over messages this can be achieved with a topic that has only one partition, though this will mean only one consumer process per consumer group.

So here are my questions:

Does it mean if i want to have more than 1 consumer (from the same group) reading from one topic I need to have more than 1 partition?
Does it mean I need same amount of partitions as amount of consumers for the same group?
How many consumers can read from one partition?

Also have some questions regarding relationship between keys and partitions with regard to API. I only looked at .net APIs (especially one from MS) but looks like the mimic Java API. I see when using a producer to send a message to a topic there is a key parameter. But when consumer reads from a topic there is a partition number.

How are partitions numbered? Starting from 0 or 1?
What exactly relationship between a key and partition? As I understand some function on key will determine a partition. is that correct?
If I have 2 partitions in a topic and want some particular messages go to one partition and other messages go to another I should use a specific key for one specific partition, and the rest for another?
What if I have 3 partitions and one type of messages to one particular partition and the rest to other 2?
How in general I send messages to a particular partition in order to know for a consumer from where to read? Or I better off with multiple topics?

Thanks in advance.

632

asked May 29 '16 15:05

Igor K.

1 Answers

Does it mean if i want to have more than 1 consumer (from the same group) reading from one topic I need to have more than 1 partition?

Let's see the following properties of kafka:

each partition is consumed by exactly one consumer in the group
one consumer in the group can consume more than one partition
the number of consumer processes in a group must be <= number of partitions

With these properties, kafka is smartly able to provide both ordering guarantees and load balancing over a pool of consumer processes.

To answer your question, yes, in the context of the same group, if you want to have N consumers, you have to have at least N partitions.

Does it mean I need same amount of partitions as amount of consumers for the same group?

I think this has been explained in the first answer.

How many consumers can read from one partition?

The number of consumers that can read from one partition is always equal to the number of consumer groups subscribing to that topic.

Relationship between keys and partitions with regard to API

First, we must understand that the producer is responsible for choosing which record to assign to which partition within the topic.

Now, lets see how producer does so. First, lets see the class definition of ProducerRecord.java :

public class ProducerRecord<K, V> {      private final String topic;     private final Integer partition;     private final Headers headers;     private final K key;     private final V value;     private final Long timestamp;  }

Here, the field that we have to understand from the class is partition.

From the ProducerRecord docs,

If a valid partition number is specified, that partition will be used when sending the record.
If no partition is specified but a key is present a partition will be chosen using a hash of the key.
If neither key nor partition is present a partition will be assigned in a round-robin fashion.

149

answered Sep 19 '22 13:09

oblivion

Related questions
                            
                                CommitFailedException Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member
                            
                                How to send final kafka-streams aggregation result of a time windowed KTable?
                            
                                What is the difference between kafka earliest and latest offset values
                            
                                How to programmatically create a topic in Apache Kafka using Python
                            
                                Why do Kafka consumers connect to zookeeper, and producers get metadata from brokers?
                            
                                How to change start offset for topic?
                            
                                Consumer not receiving messages, kafka console, new consumer api, Kafka 0.9
                            
                                How to install Kafka on Windows?
                            
                                Simple embedded Kafka test example with spring boot
                            
                                What is the difference between MQTT broker and Apache Kafka
                            
                                zookeeper is not a recognized option when executing kafka-console-consumer.sh
                            
                                Kafka Consumer get key value pair
                            
                                When does the Apache Kafka client throw a "Batch Expired" exception?
                            
                                How to write a file to Kafka Producer
                            
                                How to create a Topic in Kafka through Java
                            
                                classpath is empty. please build the project first
                            
                                Why could Kafka warn "partitions have leader brokers without a matching listener"?
                            
                                How to see the retention for a particular topic in kafka
                            
                                Which directory does apache kafka store the data in broker nodes
                            
                                Handling bad messages using Kafka's Streams API

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Topics, partitions and keys

Tags:

apache-kafka

kafka-consumer-api

kafka-producer-api

Igor K.

People also ask

1 Answers

oblivion

Recent Activity

Donate For Us