Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Kafka topic partitions

Tags:

apache-kafka

A quick question concerning Kafka's topic and partitioning. Suppose to following scenario:

  • Producer1 writes data into Topic1.

  • Producer2 writes data into Topic2

  • Consumer1 reads data from Topic1 and Topic2.

  • Consumer2 reads data only from Topic2.

The question is: how many partitions are there inside each Topic? Is it true that it depends on the number of consumers to promote parallelism? Or it's just a parameter set into the file server.config? In the latter case, is there a way to have different topics with different number of partitions inside?

like image 651
Giacomo Bartoli Avatar asked Mar 01 '18 16:03

Giacomo Bartoli


2 Answers

The first thing to understand is that a topic partition is the unit of parallelism in Kafka. On both the producer and the broker side, writes to different partitions can be done fully in parallel. On the consumer side, Kafka always gives a single partition’s data to one consumer thread. Thus, the degree of parallelism in the consumer (within a consumer group) is bounded by the number of partitions being consumed. Therefore, in general, the more partitions there are in a Kafka cluster, the higher the throughput one can achieve.

How many partitions are there inside each Topic? That's configurable. You can increase partition but once increased, you can not decrease it. Apache Kafka provides us with alter command to change Topic behavior and add/modify configurations. We will be using alter command to add more partitions to an existing Topic.

Here is the command to increase the partitions count for topic 'my-topic' to 20 -

./bin/kafka-topics.sh --alter --zookeeper localhost:2181 --topic my-topic --partitions 20

You can verify whether partitions have been increased by using describe command as follows -

./bin/kafka-topics.sh --describe --zookeeper localhost:2181 --topic my-topic

How many partitions you need to set for a topic? Please read this well written document here: https://www.confluent.io/blog/how-to-choose-the-number-of-topicspartitions-in-a-kafka-cluster/

like image 178
Monzurul Shimul Avatar answered Nov 22 '22 23:11

Monzurul Shimul


You can specify the number of partitions on topic creation. For example, you have created Topic1 with 40 partitions. Now you start just one consumer. This consumer will be assigned to every partition of your Topic1.

If you want to increase parallelism, you can start several consumers in a consumer group. For example, starting 10 consumers with the same consumer group id leads to every consumer being assigned to approximately 4 partitions.

FYI starting more consumers (in a consumer group) than # partitions you have makes no sense - some consumers will be idle.

For more information take a look at the official Kafka documentation: https://kafka.apache.org/documentation/#intro_consumers

like image 36
codejitsu Avatar answered Nov 22 '22 23:11

codejitsu