Kafka cached zkVersion not equal to that in zookeeper broker not recovering

Tags:

apache-zookeeper

I have a kafka cluster with 3 brokers. I have started facing issues lately with brokers going out of the cluster and producrs/consumers throwing leader not available errors.

On examining the logs I see following sequence of events:

//Lots of replica fetcher threads starting/stopping

[2017-10-09 14:48:50,600] INFO [ReplicaFetcherManager on broker 6] Removed fetcher for partitions

[2017-10-09 14:48:50,608] INFO [ReplicaFetcherThread-0-7], Shutting down (kafka.server.ReplicaFetcherThread)
[2017-10-09 14:48:50,918] INFO [ReplicaFetcherThread-0-7], Stopped  (kafka.server.ReplicaFetcherThread)
[2017-10-09 14:48:50,918] INFO [ReplicaFetcherThread-0-7], Shutdown completed (kafka.server.ReplicaFetcherThread)

//continuously Expanding/Shrinking ISR

[2017-10-09 14:48:51,037] INFO Partition [__consumer_offsets,8] on broker 6: Expanding ISR for partition __consumer_offsets-8 from 6,8 to 6,8,7 (kafka.cluster.Partition)
[2017-10-09 14:48:51,038] INFO Partition [__consumer_offsets,35] on broker 6: Expanding ISR for partition __consumer_offsets-35 from 6,8 to 6,8,7 (kafka.cluster.Partition)

[2017-10-09 14:49:01,702] INFO Partition [t1,1] on broker 6: Shrinking ISR for partition [t1,1] from 6,7 to 6 (kafka.cluster.Partition)
[2017-10-09 14:49:01,702] INFO Partition [__consumer_offsets,41] on broker 6: Shrinking ISR for partition [__consumer_offsets,41] from 6,8,7 to 6,8 (kafka.cluster.Partition)

//Reregisteration of broker and leader reelection

[2017-10-09 14:51:54,380] INFO re-registering broker info in ZK for broker 6

[2017-10-09 14:51:54,405] INFO New leader is 7 (kafka.server.ZookeeperLeaderElector$LeaderChangeListener)

//ControllerMovedException errors

[2017-10-09 14:56:39,746] ERROR [KafkaApi-6] Error when handling request.. org.apache.kafka.common.errors.ControllerMovedException: Broker 6 received update metadata request with correlation id 59 from an old controlle
r 7 with epoch 301. Latest known controller epoch is 302

[2017-10-09 14:57:59,210] INFO re-registering broker info in ZK for broker 6 (kafka.server.KafkaHealthcheck$SessionExpireListener)
[2017-10-09 14:57:59,210] INFO Creating /brokers/ids/6 (is it secure? false) (kafka.utils.ZKCheckedEphemeral)
[2017-10-09 14:57:59,213] INFO Result of znode creation is: OK (kafka.utils.ZKCheckedEphemeral)
[2017-10-09 14:57:59,213] INFO Registered broker 6 at path /brokers/ids/6 with addresses: EndPoint(kafka03,9092,ListenerName(PLAIN
TEXT),PLAINTEXT) (kafka.utils.ZkUtils)
[2017-10-09 14:57:59,213] INFO done re-registering broker (kafka.server.KafkaHealthcheck$SessionExpireListener)
[2017-10-09 14:57:59,213] INFO Subscribing to /brokers/topics path to watch for new topics (kafka.server.KafkaHealthcheck$SessionExpireListener
)
[2017-10-09 14:57:59,224] INFO New leader is 7 (kafka.server.ZookeeperLeaderElector$LeaderChangeListener)
[2017-10-09 14:58:11,697] INFO Partition [testing1,2] on broker 6: Shrinking ISR for partition [testing1,2] from 6,8 to 6 (kafka.cluster.Partit
ion)
[2017-10-09 14:58:11,700] INFO Partition [testing1,2] on broker 6: Cached zkVersion [199] not equal to that in zookeeper, skip updating ISR (ka
fka.cluster.Partition)

Then these errors occur in a loop, and cluster cannot recover

[2017-10-09 16:17:26,769] INFO Partition [__consumer_offsets,14] on broker 6: Shrinking ISR for partition [__consumer_offsets,14] from 7,6,8 to 7,6 (kafka.cluster.Partition)
[2017-10-09 16:17:26,771] INFO Partition [__consumer_offsets,14] on broker 6: Cached zkVersion [306] not equal to that in zookeeper, skip updating ISR (kafka.cluster.Partition)

On the clients, I receive Leader not available error.

Its not clear why the cluster is entering this invalid state.. any ideas?

745

asked Oct 09 '17 10:10

1 Answers

This issue is known and tracked in KAFKA-2729 but not solved till now. This happens as far as I know on networks with big delays due to max traffic or some short network outages in a small timeframe. The only solution (afaik) is to restart all brokers.

169

answered Nov 15 '22 11:11

A. Binzxxxxxx

Related questions
                            
                                How to get last consumed offset for a consumer group?
                            
                                Kafka High-level Consumer error_code=15
                            
                                How to deserialize records from Kafka using Structured Streaming in Java?
                            
                                How to ensure constant Avro schema generation and avoid the 'Too many schema objects created for x' exception?
                            
                                Kafka as an Akka-persistence journal [closed]
                            
                                Kafka SMT ValueToKey - How use multiple values as key?
                            
                                How to set timeout for onFailure event (Spring, Kafka)?
                            
                                how to change Kafka broker list ip
                            
                                Spring Kafka configure number of partitions for topic
                            
                                Reading into SQL Server from Kafka feed [closed]
                            
                                Kafka Stream aggregation with custom object data type
                            
                                How to achieve strong consistency in Kafka?
                            
                                How to monitor messages rate in Kafka topics?
                            
                                Error sending fetch request (sessionId=INVALID, epoch=INITIAL) to node 1001: org.apache.kafka.common.errors.DisconnectException
                            
                                How to make Spark Streaming (Spark 1.0.0) read the latest data from Kafka (Kafka Broker 0.8.1)
                            
                                Not able to connect to kafka server on google compute engine from local machine
                            
                                Kafka: No message seen on console consumer after message sent by Java Producer
                            
                                kafka on kubernetes cannot produce/consume topics (ClosedChannelException, ErrorLoggingCallback)
                            
                                Sending data with kafka-python only working when briefly delaying code
                            
                                Why do we need to mention Zookeeper details even though Apache Kafka configuration file already has it?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Kafka cached zkVersion not equal to that in zookeeper broker not recovering

Tags:

apache-kafka

apache-zookeeper

Sumit Jain

People also ask

1 Answers

A. Binzxxxxxx

Recent Activity

Donate For Us