Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How come kafka fails to commit offset for a particular partition?

Tags:

apache-kafka

kafka consumer is failing to commit offset only for a particular partition.

aklsfoipafasldmaknfa    asiofuasofiusaofasd
[2019-01-04 12:22:22,691] ERROR [Consumer clientId=consumer-1, groupId=console-consumer-11955] Offset commit failed on partition my-topic-2-9 at offset 0: The request timed out. (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
[2019-01-04 12:22:28,617] ERROR [Consumer clientId=consumer-1, groupId=console-consumer-11955] Offset commit failed on partition my-topic-2-9 at offset 1: The request timed out. (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
as;lkasl;dkas;faskfasfasfasodaspd   qdoiwudqouoaisdiaduasodiuasd
[2019-01-04 12:23:18,875] ERROR [Consumer clientId=consumer-1, groupId=console-consumer-11955] Offset commit failed on partition my-topic-2-9 at offset 1: The request timed out. (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)

Can anyone explain me this error and what could possibly cause this?

Our cluster has 5 brokers running in AWS. We use Apache Kafka 2.1.

I am running a very simple Kafka-console-producer and consuming the same message using a Kafka console consumer.

I am seeing this error after the console-consumer consumes the message.

// PRODUCER
./bin/kafka-console-producer.sh   --broker-list kafka1:9092   --topic my-topic-2 --property "parse.key=true"   --property "key.separator=,"

 //CONSUMER
./bin/kafka-console-consumer.sh --bootstrap-server kafka1:9092 --from-beginning --topic my-topic-2 --property="print.key=true"

Mind that our cluster has over 200 topics with many producers and consumers.

It's just that I am not able to understand this behavior.

He's a screenshot of grafana. enter image description here

EDIT:

Please feel free to ask for any more details. This error is really frustrating.

EDIT 2:

./bin/kafka-topics.sh --describe --zookeeper zookeeper1:2181/kafka --topic my-topic-2
Topic:my-topic-2    PartitionCount:10   ReplicationFactor:3 Configs:
Topic: my-topic-2   Partition: 0    Leader: 4   Replicas: 4,2,3 Isr: 4,2,3
Topic: my-topic-2   Partition: 1    Leader: 5   Replicas: 5,3,4 Isr: 5,4,3
Topic: my-topic-2   Partition: 2    Leader: 1   Replicas: 1,4,5 Isr: 1,4,5
Topic: my-topic-2   Partition: 3    Leader: 2   Replicas: 2,5,1 Isr: 2,1,5
Topic: my-topic-2   Partition: 4    Leader: 3   Replicas: 3,1,2 Isr: 3,2,1
Topic: my-topic-2   Partition: 5    Leader: 4   Replicas: 4,3,5 Isr: 4,3,5
Topic: my-topic-2   Partition: 6    Leader: 5   Replicas: 5,4,1 Isr: 5,4,1
Topic: my-topic-2   Partition: 7    Leader: 1   Replicas: 1,5,2 Isr: 1,2,5
Topic: my-topic-2   Partition: 8    Leader: 2   Replicas: 2,1,3 Isr: 2,3,1
Topic: my-topic-2   Partition: 9    Leader: 3   Replicas: 3,2,4 Isr: 3,2,4

EDIT 3:

I am more interested in knowing the possible causes of this issue, this might help us figure out other problems with our cluster.

EDIT 4:

All the brokers, consumers and producers are in the same VPC in same region.
I understand that offset commit timeout can be increased, but why? Whats causing such latency? 5000 ms itself is too much for a system which is supposed to be realtime.
Its possible that kafka brokers are overloaded or network is congested, but why? As you can see that data input rate is at max 2-3 mbps, Is it too much for a kafka cluster of 5 machines (r5.xlarge)? Tell me if its the case, I am quite new to kafka.
What can become bottleneck in a setup like this?

like image 213
Ankur rana Avatar asked Oct 16 '22 08:10

Ankur rana


1 Answers

What is your ratio between your consumer threads to topic partitions ?

I found in my cluster that this error more likely to occur when small number of consumer threads consuming from big number of partitions (for instance 1 thread assigned to 30 topic partitions).

The best configuration that made this error disappeared for me was 1:1 (1 consumer thread for each topic partition) but now I got scales issue when want to add more consumer threads to group.

I deal with it by developing a mechanism of consumers deployment that enforce the 1:1 ration for example when deploying 3 consumers to consume 30 partitions each one will open 10 threads and for scaling let's say deploying 10 consumers, each one will open 3 threads...

I don't know if I'm following the best-practices here but it does the job for now

like image 185
Tomer Lev Avatar answered Oct 21 '22 07:10

Tomer Lev