Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

One kafka consumer for multiple topics vs one consumer for each topic/partition

Tags:

apache-kafka

I am working on data ingestion use case where data comes on multiple topics and had to be pushed to multiple tables based on the topic name. I was trying to understand will having one consumer for all the topics has any performance difference with having one consumer for each topic/partition.

like image 751
Rajesh Avatar asked Mar 13 '18 09:03

Rajesh


1 Answers

The performance difference between these 2 scenarios depends on the numbers of brokers, partitions and on the expected throughput.

When the number of brokers, partitions and throughput is high, if you only have a single consumer for all partitions it's very likely it won't be able to cope with all the traffic.

For example, if you have 5 brokers with 5 partitions on each and each partitions has 5MB/s traffic:

  • if you have a single consumer: it will need to have a connection to each broker and will have to handle 5 x 5 x 5 MB/s = 125MB/s. Depending on your consumer config this might not be feasable. Even if you can handle 125MB/s, this does not scale well if you end up adding more partitions.

  • if you have multiple consumers: each consumer will grab a subset of the partitions. With 5 consumers, each will only have to handle 25MB/s which is easily feasable with a standard VM.

Kafka's consumer group feature makes it very easy to add consumers on the fly. So you can start with only a single consumer and add more if/when the throughput increases.

like image 139
Mickael Maison Avatar answered Oct 22 '22 01:10

Mickael Maison