Kafka Partition and Throughput

Question

i have introductory experience with kafka and I am trying to explore its details.

I am trying to understand how kafka partitions can help improving throughput; in all information i found online; it is explained that more partition means more parallel streams; which make sense.

How ever with different point of view it does not.

lets say i have two consumers which consumes data at "10"messages per second from given topic. now no mater they are consuming from single partition or two different partitions; my throughput will remain same 20 messages per second.

i feel like i must be missing some details on inner workings can you help me by explaining how kafka partitions (more than one) can help improving throughput for fixed number of consumers Vs single kafka partition.

JR ibkr · Accepted Answer

https://kafka.apache.org/intro

When I started to learn kafka; I had the same question. Following explanation will help you to answer your question:

Let's say you have a topic A with 3 partitions: X, Y & Z.

First thing to understand is how data is distributed across partitions:

Producer can choose in which partition a message will go. So your producer can send message#1 to partition-X, message#2 to partition-Y and message#3 to partition-Z. In the same way, other producers can choose in which partition data will be written. If your producer does not choose a partition then kafka will choose for you. For more information; please checkout producer API. Producer should never push message#1 to partition-X, partition-Y & partition-Z. You can create replicas to provide fault-tolerance. Partitions are not replicas.

Now, a consumer subscribes to your topic. Kafka will see how many consumers are active within a consumer group. It may allocate a partition to a consumer as following:

Kafka partition distribution

(in the image; P0, P1, P2 and P3 are partitions. Consumer group A has C1 & C2 consumers. C1 listens to P0, P3 and C2 listens to P1 and P2. In the end, your consumer group A will receive data from all partitions.)

If your consumer group had 3 consumers and you add one new consumer then it will sit ideal. No of consumers in consumer-group <= number of partitions.
If your consumer group had 2 consumers and you add a new one then rebalance will be triggered. Kafka will assign one partition to your consumer.
If this is brand new consumer-group then kafka will assign all partitions to this new consumer.

Now let's assume; your consumer is single-threaded and it takes about 1 second to process a message then your throughput would be 1 msg/second in case#3.

In case#2; it would be 3 msg/second. Because each consumer is listening to different partition and processing data.

In case#1; you won't get any benefit.

Kafka Partition and Throughput

Tags:

apache-kafka

ankit patel

1 Answers

JR ibkr

Recent Activity

Donate For Us

Kafka Partition and Throughput

Tags:

apache-kafka

ankit patel

1 Answers

JR ibkr

Related questions

Recent Activity

Donate For Us