Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Kafka Resiliency - Group Coordinator

Tags:

As I understand, one of the brokers is selected as the group coordinator which takes care of consumer rebalancing.

Discovered coordinator host:9092 (id: 2147483646 rack: null) for group good_group

I have 3 nodes with replication factor of 3 and 3 partitions. Everything is great and when I kill kafka on non-coordinator nodes, consumer is still receiving messages.

But when I kill that specific node with coordinator, rebalancing is not happening and my java consumer app does not receive any messages.

2018-05-29 16:34:22.668 INFO  AbstractCoordinator:555 - Discovered coordinator host:9092 (id: 2147483646 rack: null) for group good_group.
2018-05-29 16:34:22.689 INFO  AbstractCoordinator:600 - Marking the coordinator host:9092 (id: 2147483646 rack: null) dead for group good_group
2018-05-29 16:34:22.801 INFO  AbstractCoordinator:555 - Discovered coordinator host:9092 (id: 2147483646 rack: null) for group good_group.
2018-05-29 16:34:22.832 INFO  AbstractCoordinator:600 - Marking the coordinator host:9092 (id: 2147483646 rack: null) dead for group good_group
2018-05-29 16:34:22.933 INFO  AbstractCoordinator:555 - Discovered coordinator host:9092 (id: 2147483646 rack: null) for group good_group.
2018-05-29 16:34:23.044 WARN  ConsumerCoordinator:535 - Auto offset commit failed for group good_group: Offset commit failed with a retriable exception. You should retry committing offsets. 

Am I doing something wrong and is there a way around this?

like image 508
Anton Kim Avatar asked May 29 '18 21:05

Anton Kim


1 Answers

But when I kill that specific node with coordinator, rebalancing is not happening and my java consumer app does not receive any messages.

The group coordinator receives heartbeats from all consumers in the consumer group. It maintains a list of active consumers and initiates the rebalancing on the change of this list. Then the group leader executes the rebalance activity.

That's why the rebalancing will stop if you kill the group coordinator.

UPDATE

In the case that the group coordinator broker shutdowns, the Zookeeper will be notified and the election starts to promote a new group coordinator from the active brokers automatically. So nothing to do with group coordinator. Let's see the log:

2018-05-29 16:34:23.044 WARN  ConsumerCoordinator:535 - Auto offset commit failed for group good_group: Offset commit failed with a retriable exception. You should retry committing offsets.

The replication factor of internal topic __consumer_offset probably has the default value 1. Can you check what value of default.replication.factor and offsets.topic.replication.factor are in the server.properties files. If the values is 1 by default, it should be changed to bigger one. Failing to do so, the group coordinator shutdowns causing offset manager stops without backup. So the activity of committing offsets can not be done.

like image 165
Quang Vien Avatar answered Oct 04 '22 18:10

Quang Vien