What's the best way to design message key in Kafka?

Tags:

I have a partitioned topic, which has X partitions.

As of now, when producing messages, I create Kafka's ProducerRecord specifying only topic and value. I do not define a key. As far as I understand, my messages gonna be distributed evenly amongst partitions using default built-in partitioner. On the other hand, I have a thread pool of Kafka consumers. Each Kafka consumer will be running in its own dedicated thread consuming messages from the topic. Each of those consumers is given the same group.id. This will allow consuming messages in parallel. Every consumer will be assigned its fair share of partitions to read from.

I want my messages to be consumed in an orderly fashion. I know that Kafka guarantees the order of messages within a partition. So, as long as I come up with a proper key structure, I will have my messages partitioned in a way that they will end up in the same partition. In a way, message key groups messages and stores them in the partition.

Does it make sense?

Q: Is there a chance that due to a badly designed key I will get uneven partitions? One may receive way more records than the others. Can it impact in a bad way performance of my Kafka cluster? What are the best practices for message key design?

244

asked Aug 25 '17 19:08

Ihor M.

Video Answer

2 Answers

Your understanding of default partitioner is correct.

When you don't have a requirement to consume some messages in the same order as they were produced then not specifying a key is the best option. If that is not your case, then your requirement tells you what must be your key. For instance if you want to preserve the order of produced messages for a given user, a user_id is potentially your message key.

To achieve a particular messages order you need to think how producers are configured. If your producers can retry sending a message in case of failure and in flight messages are higher than 1 then messages can be produced out of order.

You can get uneven partition by specifying bad key. For example, if 90% of your users are from New York and 10% from other cities and you choose a city as a key, then one of yours partition will be huge and one of the consumers overloaded (I assume that the number of messages per user is the same).

199

answered Nov 03 '22 10:11

Daniel

Kafka will apply murmur hash on the key and modulo with number of partitions so it i.e. murmur2(record.key())) % num partitions. In all likely hood it should get evenly distributed in the case of default partitioning. I would suggest you to experiment all your key options with a simple murmur2 function written in java to see the distribution pattern and then make a choice. Also there are two implementations of default partitioning in kafka. Murmur hash implementation is in the newer version. Old legacy versions work differently.

answered Nov 03 '22 11:11

Swapnil

Related questions
                            
                                Qt can't figure out how to thread my return value in my program
                            
                                Is ActiveRecord::Batches::find_each thread safe?
                            
                                Unable to understand Class object
                            
                                Blocking in pthread_join()
                            
                                How to share stdout for multi-threaded python script?
                            
                                C++ Inheritance : Calling virtual method when it has been overridden
                            
                                ActiveRecord::Base.connection.query_cache_enabled in sidekiq
                            
                                How expensive are atomic operations?
                            
                                Why is a thread blocking my JavaFX UI Thread?
                            
                                Alternative to sleep inside a thread
                            
                                Multithreaded Functional Programming in Swift
                            
                                Understanding the necessity of wait() and notify() [duplicate]
                            
                                Little confused on the thread behaviour
                            
                                Async/await vs Task.Run in C#
                            
                                How to use volatile correctly in Java
                            
                                How do the channels work in Rust By Example?
                            
                                How to avoid false positives with Helgrind?
                            
                                Is cluster.fork() guaranteed to use a different CPU core?
                            
                                Check what thread is currently doing in python
                            
                                How to run Rails multi-threaded in development?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What's the best way to design message key in Kafka?

Tags:

multithreading

multiprocessing

apache-kafka

kafka-consumer-api

kafka-producer-api