Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Kafka single consumer failure in a group

Tags:

apache-kafka

I am in the initial phases of exploring Kafka, version 0.8.1.1.

I've successfully run the Consumer Group Example, with multiple partitions and its distributing messages among the consumers quite well.

One test case I wanted to run is when a consumer in the group dies suddenly (example, kill -9 ). When I do so, I expected rebalancing to occur, but its not happening. So, can I do one of these things?

  1. Trigger rebalancing using API
  2. Configure kafka to wait for a certain time for the consumer activity and rebalance automatically assuming it was shut down ungracefully.

The problem here is, all the messages in the partitions assigned to the dead Consumer remains in the queue and is never processed until rebalancing occurs.

like image 522
binit Avatar asked May 21 '14 05:05

binit


1 Answers

The Rebalance will happen automatically which can be set in the consumer config ( zookeeper.session.timeout.ms ). As per the documentaion

zookeeper.session.timeout.ms : ZooKeeper session timeout. If the consumer fails to heartbeat to zookeeper for this period of time it is considered dead and a rebalance will occur. default value is 6000 ms

The other live consumer in the same group will start to recieve the message after the timeout interval.

Configure this timeout value as per your requirements.

Also some more info from the kafka documentation:

Consumer rebalancing fails (you will see ConsumerRebalanceFailedException): This is due to conflicts when two consumers are trying to own the same topic partition. The log will show you what caused the conflict (search for "conflict in ").

  1. If your consumer subscribes to many topics and your ZK server is busy, this could be caused by consumers not having enough time to see a consistent view of all consumers in the same group. If this is the case, try Increasing rebalance.max.retries and rebalance.backoff.ms.
  2. Another reason could be that one of the consumers is hard killed. Other consumers during rebalancing won't realize that consumer is gone after zookeeper.session.timeout.ms time. In the case, make sure that rebalance.max.retries * rebalance.backoff.ms > zookeeper.session.timeout.ms.
like image 142
Tadaka Avatar answered Sep 22 '22 07:09

Tadaka