As I understand, one of the brokers is selected as the group coordinator which takes care of consumer rebalancing.
Discovered coordinator host:9092 (id: 2147483646 rack: null) for group good_group
I have 3 nodes with replication factor of 3 and 3 partitions. Everything is great and when I kill kafka on non-coordinator nodes, consumer is still receiving messages.
But when I kill that specific node with coordinator, rebalancing is not happening and my java consumer app does not receive any messages.
2018-05-29 16:34:22.668 INFO AbstractCoordinator:555 - Discovered coordinator host:9092 (id: 2147483646 rack: null) for group good_group.
2018-05-29 16:34:22.689 INFO AbstractCoordinator:600 - Marking the coordinator host:9092 (id: 2147483646 rack: null) dead for group good_group
2018-05-29 16:34:22.801 INFO AbstractCoordinator:555 - Discovered coordinator host:9092 (id: 2147483646 rack: null) for group good_group.
2018-05-29 16:34:22.832 INFO AbstractCoordinator:600 - Marking the coordinator host:9092 (id: 2147483646 rack: null) dead for group good_group
2018-05-29 16:34:22.933 INFO AbstractCoordinator:555 - Discovered coordinator host:9092 (id: 2147483646 rack: null) for group good_group.
2018-05-29 16:34:23.044 WARN ConsumerCoordinator:535 - Auto offset commit failed for group good_group: Offset commit failed with a retriable exception. You should retry committing offsets.
Am I doing something wrong and is there a way around this?
But when I kill that specific node with coordinator, rebalancing is not happening and my java consumer app does not receive any messages.
The group coordinator receives heartbeats from all consumers in the consumer group. It maintains a list of active consumers and initiates the rebalancing on the change of this list. Then the group leader executes the rebalance activity.
That's why the rebalancing will stop if you kill the group coordinator.
UPDATE
In the case that the group coordinator broker shutdowns, the Zookeeper will be notified and the election starts to promote a new group coordinator from the active brokers automatically. So nothing to do with group coordinator. Let's see the log:
2018-05-29 16:34:23.044 WARN ConsumerCoordinator:535 - Auto offset commit failed for group good_group: Offset commit failed with a retriable exception. You should retry committing offsets.
The replication factor of internal topic __consumer_offset probably has the default value 1. Can you check what value of default.replication.factor and offsets.topic.replication.factor are in the server.properties files. If the values is 1 by default, it should be changed to bigger one. Failing to do so, the group coordinator shutdowns causing offset manager stops without backup. So the activity of committing offsets can not be done.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With