Why isn't kafka continuing to work on fail of one of the brokers?

Question

I am under the impression that with two brokers with sync turned on my kafka setup should keep on working even on fail of one of the broker.

To test it I made a new topic named topicname. Its description is as follows:

Topic:topicname    PartitionCount:1 ReplicationFactor:1 Configs:
Topic: topicname    Partition: 0    Leader: 0   Replicas: 0 Isr: 0

Then I ran producer.sh and consumer.sh in the following way:

bin/kafka-console-producer.sh --broker-list localhost:9092,localhost:9095 sync --topic topicname

bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic topicname --from-beginning

Till both the brokers were working I saw that messages were being received properly by the consumer, but when I killed one of the instance of the brokers through kill command then the consumer stopped showing me any new messages. Instead it showed me the following error message:

WARN [ConsumerFetcherThread-console-consumer-57116_ip-<internalipvalue>-1438604886831-603de65b-0-0], Error in fetch Name: FetchRequest; Version: 0; CorrelationId: 865; ClientId: console-consumer-57116; ReplicaId: -1; MaxWait: 100 ms; MinBytes: 1 bytes; RequestInfo: [topicname,0] -> PartitionFetchInfo(9,1048576). Possible cause: java.nio.channels.ClosedChannelException (kafka.consumer.ConsumerFetcherThread)
[2015-08-03 12:29:36,341] WARN Fetching topic metadata with correlation id 1 for topics [Set(topicname)] from broker [id:0,host:<hostname>,port:9092] failed (kafka.client.ClientUtils$)
java.nio.channels.ClosedChannelException
at kafka.network.BlockingChannel.send(BlockingChannel.scala:100)
at kafka.producer.SyncProducer.liftedTree1$1(SyncProducer.scala:73)
at kafka.producer.SyncProducer.kafka$producer$SyncProducer$$doSend(SyncProducer.scala:72)
at kafka.producer.SyncProducer.send(SyncProducer.scala:113)
at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:58)
at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:93)
at kafka.consumer.ConsumerFetcherManager$LeaderFinderThread.doWork(ConsumerFetcherManager.scala:66)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60)

Gaurav · Accepted Answer

I had this similar problem, setting the producer config "topic.metadata.refresh.interval.ms" to -1 (or whatever value is suitable for you) solved the issue for me. So in my case , I had 3 broker (multi broker set up on my local machine) and created the topic with 3 partitions and replication factor 2.

Test set up:

Before the producer config:

Tried 3 brokers running , killed one of the brokers after producer started, the local Zookeeper updated the ISR and topic metadata info (removed down broker as leader) but the producer did not pick it up (may be due to default 10 mins refresh time).So messages end up failing. I get send exceptions.

After the producer config (-1 in my case):

Tried 3 brokers running , killed one of the brokers after producer started, the local Zookeeper updated the ISR info (removed down broker as leader), the producer refreshed the new ISR/topic metadata info and messages send did not fail.

-1 makes it refresh topic metadata on each failed attempt so may be you want to reduce the refresh time to something reasonable instead.

Why isn't kafka continuing to work on fail of one of the brokers?

Tags:

apache-kafka

Sumit Sinha

1 Answers

Gaurav

Recent Activity

Donate For Us

Why isn't kafka continuing to work on fail of one of the brokers?

Tags:

apache-kafka

Sumit Sinha

1 Answers

Gaurav

Related questions

Recent Activity

Donate For Us