Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it possible to create a kafka topic with dynamic partition count?

Tags:

I am using kafka to stream the events of page visits by the website users to an analytics service. Each event will contain the following details for the consumer:

  • user id
  • IP address of the user

I need very high throughput, so I decided to partition the topic with partition key as userId-ipAddress ie

For a userId 1000 and ip address 10.0.0.1, the event will have partition key as "1000-10.0.0.1"

In this use case the partition key is dynamic, so specifying the number of partitions upfront while creating the topic. Is it possible to create topic in kafka with dynamic partition count?

Is it a good practice to use this kind of partitioning or Is there any other way this can be achieved?

like image 593
vivek_jonam Avatar asked Sep 24 '15 12:09

vivek_jonam


People also ask

Can we change number of partitions in Kafka?

If you want to change the number of partitions or replicas of your Kafka topic, you can use a streaming transformation to automatically stream all of the messages from the original topic into a new Kafka topic that has the desired number of partitions or replicas.

Can a Kafka topic have multiple partitions?

Kafka's topics are divided into several partitions. While the topic is a logical concept in Kafka, a partition is the smallest storage unit that holds a subset of records owned by a topic . Each partition is a single log file where records are written to it in an append-only fashion.

How many partitions should a Kafka topic have?

A Kafka cluster should have a maximum of 200,000 partitions across all brokers when managed by Zookeeper. The reason is that if brokers go down, Zookeeper needs to perform a lot of leader elections. Confluent still recommends up to 4,000 partitions per broker in your cluster.


1 Answers

It's not possible to create a Kafka topic with dynamic partition count. When you create a topic you have to specify the number of partitions. You can change it later manually using Replication Tools.

But I don't understand why do you need dynamic partition count in the first place. The partition key is not related to the number of partitions. You can use your partition key with ten partitions or with thousand partitions. When you send a message to Kafka topic, Kafka must send it to a specific partition. Every partition is identify by it's ID which is simply a number. Kafka computes something like this

partition_id = hash(partition_key) % number_of_partition 

and it sends the message to partition partition_id. If you have far more users than partitions you should be OK. More suggestions:

  • Use userId as a partition key. You probably don't need IP address as a part of partition key. What is it good for? Typically you need all messages from a single user to end up in a single partition. If you have IP address as a partition key then the messages from a single user could end up in multiple partitions. I don't know your use case but it general that's not what you want.
  • Measure how many partitions you need to process all messages. Then create let's say ten times more partitions. You can create more partitions than you actually need. Kafka won't mind and there are no performance penalties. See How to choose the number of topics/partitions in a Kafka cluster?

Right now you should be able to process all messages in your system. If traffic grows you can add more Kafka brokers and you can use Replication tools to change leaders/replicas for partitions. If the traffic grows more than ten times you must create new partitions.

like image 161
Lukáš Havrlant Avatar answered Nov 02 '22 04:11

Lukáš Havrlant