Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Kafka console consumer ERROR "Offset commit failed on partition"

I am using a kafka-console-consumer to probe a kafka topic.

Intermittently, I am getting this error message, followed by 2 warnings:

[2018-05-01 18:14:38,888] ERROR [Consumer clientId=consumer-1, groupId=console-consumer-56648] Offset commit failed on partition my-topic-0 at offset 444: The coordinator is not aware of this member. (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)

[2018-05-01 18:14:38,888] WARN [Consumer clientId=consumer-1, groupId=console-consumer-56648] Asynchronous auto-commit of offsets {my-topic-0=OffsetAndMetadata{offset=444, metadata=''}} failed: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing the session timeout or by reducing the maximum size of batches returned in poll() with max.poll.records. (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)

[2018-05-01 18:14:38,888] WARN [Consumer clientId=consumer-1, groupId=console-consumer-56648] Synchronous auto-commit of offsets {my-topic-0=OffsetAndMetadata{offset=447, metadata=''}} failed: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing the session timeout or by reducing the maximum size of batches returned in poll() with max.poll.records. (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)

It suggested in the warn logs that:

This means that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing the session timeout or by reducing the maximum size of batches returned in poll() with max.poll.records.

So, I either need to increase max.poll.interval.ms or decrease max.poll.records.

Please advise what would be the implication of each method, and which one is preferred on a different situation?

like image 678
Yeming Huang Avatar asked May 02 '18 13:05

Yeming Huang


People also ask

What is offset in Kafka partition?

Each partition is going to be a stream of data as well and each Partition will have the data in it being ordered and each message within a Partition will get an incremental ID which is the position of the message in the Partition and that specific ID is called an Offset.

How does a consumer commit offsets in Kafka?

By default, the consumer is configured to auto-commit offsets. Using auto-commit gives you “at least once” delivery: Kafka guarantees that no messages will be missed, but duplicates are possible. Auto-commit basically works as a cron with a period set through the auto.commit.interval.ms configuration property.

Can we reset offset in Kafka?

Use the kafka-consumer-groups.sh to change or reset the offset. You would have to specify the topic, consumer group and use the –reset-offsets flag to change the offset.

What is auto offset reset in Kafka?

The auto offset reset consumer configuration defines how a consumer should behave when consuming from a topic partition when there is no initial offset. This is most typically of interest when a new consumer group has been defined and is listening to a topic for the first time.


1 Answers

If you increase max.poll.interval.ms that says “it’s ok to spend time processing a large batch of records” and you’ll gain throughput if you can process larger batches more efficiently than smaller ones.

To decrease max.poll.records says ”take fewer records so there’s enough time to process them” and would favor latency over throughput.

Also consider that both are configured fine, but something else is causing performance issues within your poll loop. I would explore that first before changing configuration so you don’t mask a bigger problem.

like image 89
Nathan Walther Avatar answered Sep 22 '22 08:09

Nathan Walther