Kafka Resiliency - Group Coordinator

Question

As I understand, one of the brokers is selected as the group coordinator which takes care of consumer rebalancing.

Discovered coordinator host:9092 (id: 2147483646 rack: null) for group good_group

I have 3 nodes with replication factor of 3 and 3 partitions. Everything is great and when I kill kafka on non-coordinator nodes, consumer is still receiving messages.

But when I kill that specific node with coordinator, rebalancing is not happening and my java consumer app does not receive any messages.

2018-05-29 16:34:22.668 INFO  AbstractCoordinator:555 - Discovered coordinator host:9092 (id: 2147483646 rack: null) for group good_group.
2018-05-29 16:34:22.689 INFO  AbstractCoordinator:600 - Marking the coordinator host:9092 (id: 2147483646 rack: null) dead for group good_group
2018-05-29 16:34:22.801 INFO  AbstractCoordinator:555 - Discovered coordinator host:9092 (id: 2147483646 rack: null) for group good_group.
2018-05-29 16:34:22.832 INFO  AbstractCoordinator:600 - Marking the coordinator host:9092 (id: 2147483646 rack: null) dead for group good_group
2018-05-29 16:34:22.933 INFO  AbstractCoordinator:555 - Discovered coordinator host:9092 (id: 2147483646 rack: null) for group good_group.
2018-05-29 16:34:23.044 WARN  ConsumerCoordinator:535 - Auto offset commit failed for group good_group: Offset commit failed with a retriable exception. You should retry committing offsets.

Am I doing something wrong and is there a way around this?

Quang Vien · Accepted Answer

But when I kill that specific node with coordinator, rebalancing is not happening and my java consumer app does not receive any messages.

The group coordinator receives heartbeats from all consumers in the consumer group. It maintains a list of active consumers and initiates the rebalancing on the change of this list. Then the group leader executes the rebalance activity.

That's why the rebalancing will stop if you kill the group coordinator.

UPDATE

In the case that the group coordinator broker shutdowns, the Zookeeper will be notified and the election starts to promote a new group coordinator from the active brokers automatically. So nothing to do with group coordinator. Let's see the log:

2018-05-29 16:34:23.044 WARN  ConsumerCoordinator:535 - Auto offset commit failed for group good_group: Offset commit failed with a retriable exception. You should retry committing offsets.

The replication factor of internal topic __consumer_offset probably has the default value 1. Can you check what value of default.replication.factor and offsets.topic.replication.factor are in the server.properties files. If the values is 1 by default, it should be changed to bigger one. Failing to do so, the group coordinator shutdowns causing offset manager stops without backup. So the activity of committing offsets can not be done.

Kafka Resiliency - Group Coordinator

Tags:

Anton Kim

1 Answers

Quang Vien

Recent Activity

Donate For Us

Kafka Resiliency - Group Coordinator

Tags:

Anton Kim

1 Answers

Quang Vien

Related questions

Recent Activity

Donate For Us