I am using kafka to stream the events of page visits by the website users to an analytics service. Each event will contain the following details for the consumer: <ul> <li>user id</li> <li>IP address of the user</li> </ul> I need very high throughput, so I decided to partition the topic with partition key as <code>userId-ipAddress</code> ie <blockquote> For a userId 1000 and ip address 10.0.0.1, the event will have partition key as "1000-10.0.0.1" </blockquote> In this use case the partition key is dynamic, so specifying the number of partitions upfront while creating the topic. Is it possible to create topic in kafka with dynamic partition count? Is it a good practice to use this kind of partitioning or Is there any other way this can be achieved?

It's not possible to create a Kafka topic with dynamic partition count. When you create a topic you have to specify the number of partitions. You can change it later manually using Replication Tools. But I don't understand why do you need dynamic partition count in the first place. The partition key is not related to the number of partitions. You can use your partition key with ten partitions or with thousand partitions. When you send a message to Kafka topic, Kafka must send it to a specific partition. Every partition is identify by it's ID which is simply a number. Kafka computes something like this <pre class="prettyprint"><code>partition_id = hash(partition_key) % number_of_partition </code></pre> and it sends the message to partition <code>partition_id</code>. If you have far more users than partitions you should be OK. More suggestions: <ul> <li>Use <code>userId</code> as a partition key. You probably don't need IP address as a part of partition key. What is it good for? Typically you need all messages from a single user to end up in a single partition. If you have IP address as a partition key then the messages from a single user could end up in multiple partitions. I don't know your use case but it general that's not what you want. </li> <li>Measure how many partitions you need to process all messages. Then create let's say ten times more partitions. You can create more partitions than you actually need. Kafka won't mind and there are no performance penalties. See How to choose the number of topics/partitions in a Kafka cluster? </li> </ul> Right now you should be able to process all messages in your system. If traffic grows you can add more Kafka brokers and you can use Replication tools to change leaders/replicas for partitions. If the traffic grows more than ten times you must create new partitions.

Is it possible to create a kafka topic with dynamic partition count?

1 Answers

It's not possible to create a Kafka topic with dynamic partition count. When you create a topic you have to specify the number of partitions. You can change it later manually using Replication Tools.

But I don't understand why do you need dynamic partition count in the first place. The partition key is not related to the number of partitions. You can use your partition key with ten partitions or with thousand partitions. When you send a message to Kafka topic, Kafka must send it to a specific partition. Every partition is identify by it's ID which is simply a number. Kafka computes something like this

partition_id = hash(partition_key) % number_of_partition

and it sends the message to partition partition_id. If you have far more users than partitions you should be OK. More suggestions:

Use userId as a partition key. You probably don't need IP address as a part of partition key. What is it good for? Typically you need all messages from a single user to end up in a single partition. If you have IP address as a partition key then the messages from a single user could end up in multiple partitions. I don't know your use case but it general that's not what you want.
Measure how many partitions you need to process all messages. Then create let's say ten times more partitions. You can create more partitions than you actually need. Kafka won't mind and there are no performance penalties. See How to choose the number of topics/partitions in a Kafka cluster?

Right now you should be able to process all messages in your system. If traffic grows you can add more Kafka brokers and you can use Replication tools to change leaders/replicas for partitions. If the traffic grows more than ten times you must create new partitions.

161

answered Nov 02 '22 04:11

Lukáš Havrlant

Related questions
                            
                                How does `nullable=False` work in SQLAlchemy
                            
                                Whats the replacement for @Scripts.Render in MVC 6
                            
                                Default value is GUID in SQL Server table column
                            
                                NotificationCompat.Builder setLargeIcon() not working?
                            
                                Why does `System.out.println(null);` give "The method println(char[]) is ambiguous for the type PrintStream error"?
                            
                                Angular 2 Keyboard Events
                            
                                android data binding: how to get useful error messages
                            
                                Why so many tasks in my spark job? Getting 200 Tasks By Default
                            
                                Google App Engine vs Heroku
                            
                                Telling Chrome to debug js rather than ts
                            
                                How to overload constructors in JavaScript ECMA6? [duplicate]
                            
                                What is the in-practice difference between generic and protocol-typed function parameters?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Is it possible to create a kafka topic with dynamic partition count?

Tags:

vivek_jonam

People also ask

1 Answers

Lukáš Havrlant

Recent Activity

Donate For Us