This is more of a system design question.
Let's assume I have a microservice architecture and I have X instances of Service B
(to load-balance HTTP requests to the service). But, Service B
is also a consumer on some Kafka topic. How can I avoid processing the same message X times (X is the number of instances of Service B
)? at least once
could be fine if processing is idempotent. It doesn't need to be exactly once
, but cannot be X times
.
Service A could be an Order Service. It produces messages about user making an order to Orders topic.
Service B could be a Payment Service. It consumes messages from Orders topic to charge the user.
Paying for an order could be an idempotent operation. But still, if I have 10 instances of Payment Service, I don't want to waste CPU and IO for doing something 10 times.
Even if partitioning is the answer, what if we have more instances of a particular microservice than partitions?
Kafka consumer group
When you have x
instances of a service and you want messages to be consumed by the service only once, this is something the consumer group concept of Kafka takes care of.
Essentially you need to specify a common kafka consumer group ID for your service instances, and then kafka will take care of assigning partitions of the topic to different consumer instances of your service such that no message would be consumed twice by your service.
There would be a config for consumer group ID among the kafka consumer configs in the kafka consumer library that you're using. You just need to ensure that a single set of your service instances are assigned the same value of consumer group ID for that config.
Explanation with an example
If your service B
has 10
instances, you specify a common kafka consumer group ID for all the 10 instances, say serviceBConsumerGroup
. When consumption from a kafka topic with 10 partitions begins, kafka takes care of assigning the partitions of the topic to all the instances of the consumer group serviceBConsumerGroup
. So essentially it would assign each partition to each instance (when there are 10 instances of the service and 10 partitions of the topic). If there are 5 instances of the service and 10 partitions of the kafka topic, Kafka would assign 2 partitions to every instance for consumption.
Some references to read about Kafka consumer group:
As per the question,
You don't need to worry about the reprocessing of the same message even if you have multiple instances of service. Because Kafka works on consumer offset, if you once read the message, we commit the offset for that message so that message won't be available for that consumer(group).
Let's take an example,
Order service publishes the message on Order topic and payment service subscribed to it. And we have 10 instances of Payment services. In this case, let's consider message 1 get's consumed by the Payment service instance 1 (which belongs to the default consumer group if you don't define it explicitly). After consuming message 1, the Payment service instance 1 commits the offset for that message (at consumer group level)and it will be considered as the message has been processed successfully. So Payment service instance 2/anyother instance will only pick the messages which offset as not committed.
As per @Madhu's answer, it is for faster and parallelism of processing the messages. If you have 10 instances of Payment services and wanted to access and process messages fastly then you can add more consumer under consumer group(But need to consider the partitions count for the Order Topic because if Order topic has 4 partitions and we defined 5 consumer then 1 consumer always in idle state unless any other consumer goes down).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With