A quick question concerning Kafka's topic and partitioning. Suppose to following scenario:
Producer1 writes data into Topic1.
Producer2 writes data into Topic2
Consumer1 reads data from Topic1 and Topic2.
Consumer2 reads data only from Topic2.
The question is: how many partitions are there inside each Topic? Is it true that it depends on the number of consumers to promote parallelism? Or it's just a parameter set into the file server.config? In the latter case, is there a way to have different topics with different number of partitions inside?
The first thing to understand is that a topic partition is the unit of parallelism in Kafka. On both the producer and the broker side, writes to different partitions can be done fully in parallel. On the consumer side, Kafka always gives a single partition’s data to one consumer thread. Thus, the degree of parallelism in the consumer (within a consumer group) is bounded by the number of partitions being consumed. Therefore, in general, the more partitions there are in a Kafka cluster, the higher the throughput one can achieve.
How many partitions are there inside each Topic? That's configurable. You can increase partition but once increased, you can not decrease it. Apache Kafka provides us with alter command to change Topic behavior and add/modify configurations. We will be using alter command to add more partitions to an existing Topic.
Here is the command to increase the partitions count for topic 'my-topic' to 20 -
./bin/kafka-topics.sh --alter --zookeeper localhost:2181 --topic my-topic --partitions 20
You can verify whether partitions have been increased by using describe command as follows -
./bin/kafka-topics.sh --describe --zookeeper localhost:2181 --topic my-topic
How many partitions you need to set for a topic? Please read this well written document here: https://www.confluent.io/blog/how-to-choose-the-number-of-topicspartitions-in-a-kafka-cluster/
You can specify the number of partitions on topic creation. For example, you have created Topic1 with 40 partitions. Now you start just one consumer. This consumer will be assigned to every partition of your Topic1.
If you want to increase parallelism, you can start several consumers in a consumer group. For example, starting 10 consumers with the same consumer group id leads to every consumer being assigned to approximately 4 partitions.
FYI starting more consumers (in a consumer group) than # partitions you have makes no sense - some consumers will be idle.
For more information take a look at the official Kafka documentation: https://kafka.apache.org/documentation/#intro_consumers
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With