Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the best way to design message key in Kafka?

I have a partitioned topic, which has X partitions.

As of now, when producing messages, I create Kafka's ProducerRecord specifying only topic and value. I do not define a key. As far as I understand, my messages gonna be distributed evenly amongst partitions using default built-in partitioner. On the other hand, I have a thread pool of Kafka consumers. Each Kafka consumer will be running in its own dedicated thread consuming messages from the topic. Each of those consumers is given the same group.id. This will allow consuming messages in parallel. Every consumer will be assigned its fair share of partitions to read from.

I want my messages to be consumed in an orderly fashion. I know that Kafka guarantees the order of messages within a partition. So, as long as I come up with a proper key structure, I will have my messages partitioned in a way that they will end up in the same partition. In a way, message key groups messages and stores them in the partition.

Does it make sense?

Q: Is there a chance that due to a badly designed key I will get uneven partitions? One may receive way more records than the others. Can it impact in a bad way performance of my Kafka cluster? What are the best practices for message key design?

like image 244
Ihor M. Avatar asked Aug 25 '17 19:08

Ihor M.


People also ask

Should Kafka message keys be unique?

key edit. Optional Kafka event key. If configured, the event key must be unique and can be extracted from the event using a format string.

What should be Kafka key?

Usually, the key of a Kafka message is used to select the partition and the return value (of type int ) is the partition number. Without a key, you need to rely on the value which might be much more complex to process.

What is the purpose of Key in Kafka message?

Kafka uses the abstraction of a distributed log that consists of partitions. Splitting a log into partitions allows to scale-out the system. Keys are used to determine the partition within a log to which a message get's appended to. While the value is the actual payload of the message.

How many Kafka partitions is too many?

But here are a few general rules: maximum 4000 partitions per broker (in total; distributed over many topics) maximum 200,000 partitions per Kafka cluster (in total; distributed over many topics) resulting in a maximum of 50 brokers per Kafka cluster.


Video Answer


2 Answers

Your understanding of default partitioner is correct.

When you don't have a requirement to consume some messages in the same order as they were produced then not specifying a key is the best option. If that is not your case, then your requirement tells you what must be your key. For instance if you want to preserve the order of produced messages for a given user, a user_id is potentially your message key.

To achieve a particular messages order you need to think how producers are configured. If your producers can retry sending a message in case of failure and in flight messages are higher than 1 then messages can be produced out of order.

You can get uneven partition by specifying bad key. For example, if 90% of your users are from New York and 10% from other cities and you choose a city as a key, then one of yours partition will be huge and one of the consumers overloaded (I assume that the number of messages per user is the same).

like image 199
Daniel Avatar answered Nov 03 '22 10:11

Daniel


Kafka will apply murmur hash on the key and modulo with number of partitions so it i.e. murmur2(record.key())) % num partitions. In all likely hood it should get evenly distributed in the case of default partitioning. I would suggest you to experiment all your key options with a simple murmur2 function written in java to see the distribution pattern and then make a choice. Also there are two implementations of default partitioning in kafka. Murmur hash implementation is in the newer version. Old legacy versions work differently.

like image 31
Swapnil Avatar answered Nov 03 '22 11:11

Swapnil