Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Kafka: Single consumer group in multiple instances

I am working on implementing a Kafka based solution to our application. As per the Kafka documentation, what i understand is one consumer in a consumer group (which is a thread) is internally mapped to one partition in the subscribed topic.

Let's say i have a topic with 40 partitions and i have a high level consumer running in 4 instances. I do not want one instance to consume the same messages consumed by another instance. But if one instance goes down, the other three instances should be able to process all the messages.

  • Should i go for same consumer group with 10 threads per instance? - Stackoverflow says same consumer group between the instances act as traditional synchronous queue mechanism

In Apache Kafka why can't there be more consumer instances than partitions?

  • Or Should i go for different consumer group per instance?

Using simple consumer or low level consumer gives control over the partition but then if one instance goes down, the other three instances would not process the messages from the partitions consumed in first instance

like image 947
Sudharsan Avatar asked Jun 16 '17 10:06

Sudharsan


1 Answers

First to explain the concept of Consumers & Consumer Groups,

Consumers label themselves with a consumer group name, and each record published to a topic is delivered to one consumer instance within each subscribing consumer group.

The records will be effectively load balanced over the consumer instances in a consumer group. If all the consumer instances have different consumer groups, then each record will be broadcast to all the consumer processes.

The way consumption is implemented in Kafka is by dividing up the partitions in the log over the consumer instances so that each instance is the exclusive consumer of a "fair share" of partitions at any point in time. If new instances join the group they will take over some partitions from other members of the group; if an instance dies, its partitions will be distributed to the remaining instances.

Now to answer your questions,

1. I do not want one instance to consume the same messages consumed by another instance. But if one instance goes down, the other three instances should be able to process all the messages.

This is possible by default in Kafka architecture. You just have to label all the 4 instances with the same consumer group name.

2. Should i go for same consumer group with 10 threads per instance ?

Doing this will assign each thread a kafka partition from which it will consume data, which is optimal. Reducing the number of threads will load balance the record distribution among the consumer instances and MAY overload some of the consumer instances.

3. In Apache Kafka why can't there be more consumer instances than partitions?

In Kafka, a partition can be assigned only to one consumer instance. Thus, creating more consumer instances than partitions will lead to idle consumers who will not be consuming any records from kafka.

4. Should i go for different consumer group per instance?

No. This will lead to duplication of the records, as every record will be sent to all the instances, as they are from different consumer groups.

Hope this clarifies your doubts.

like image 123
Daniccan Avatar answered Oct 23 '22 22:10

Daniccan