I often get Timeout exceptions due to various reasons in my Kafka producer. I am using all the default values for producer config currently. I have seen following Timeout exceptions: <blockquote> org.apache.kafka.common.errors.TimeoutException: Failed to update metadata after 60000 ms. org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s) for topic-1-0: 30001 ms has passed since last append </blockquote> I have following questions: <ol> <li> What are the general causes of these Timeout exceptions? <ol> <li>Temporary network issue</li> <li>Server issue? if yes then what kind of server issue?</li> </ol> </li> <li> what are the general guidelines to handling the Timeout exception? <ol> <li>Set 'retries' config so that Kafka API does the retries?</li> <li>Increase 'request.timeout.ms' or 'max.block.ms' ?</li> <li>Catch the exception and have application layer retry sending the message but this seems hard with Async send as messages will then be sent out of order?</li> </ol> </li> <li>Are Timeout exceptions retriable exceptions and is it safe to retry them?</li> </ol> I am using Kafka v2.1.0 and Java 11. Thanks in advance.

"What are the general causes of these Timeout exceptions?" <ol> <li>The most common cause that I saw earlier was due to staled metadata information: one broker went down, and the topic partitions on that broker were failed over to other brokers. However, the topic metadata information has not been updated properly, and the client still tries to talk to the failed broker to either get metadata info, or to publish the message. That causes timeout exception. </li> <li>Netwowrk connectivity issues. This can be easily diagnosed with <code>telnet broker_host borker_port</code> </li> <li>The broker is overloaded. This can happen if the broker is saturated with high workload, or hosts too many topic partitions. </li> </ol> To handle the timeout exceptions, the general practice is: <ol> <li>Rule out broker side issues. make sure that the topic partitions are fully replicated, and the brokers are not overloaded</li> <li>Fix host name resolution or network connectivity issues if there are any</li> <li>Tune parameters such as <code>request.timeout.ms</code>, <code>delivery.timeout.ms</code> etc. My past experience was that the default value works fine in most of the cases. </li> </ol>

Guidelines to handle Timeout exception for Kafka Producer?

Tags:

java

apache-kafka

kafka-producer-api

timeoutexception

I often get Timeout exceptions due to various reasons in my Kafka producer. I am using all the default values for producer config currently.

I have seen following Timeout exceptions:

org.apache.kafka.common.errors.TimeoutException: Failed to update metadata after 60000 ms.

org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s) for topic-1-0: 30001 ms has passed since last append

I have following questions:

What are the general causes of these Timeout exceptions?
1. Temporary network issue
2. Server issue? if yes then what kind of server issue?
what are the general guidelines to handling the Timeout exception?
1. Set 'retries' config so that Kafka API does the retries?
2. Increase 'request.timeout.ms' or 'max.block.ms' ?
3. Catch the exception and have application layer retry sending the message but this seems hard with Async send as messages will then be sent out of order?
Are Timeout exceptions retriable exceptions and is it safe to retry them?

I am using Kafka v2.1.0 and Java 11.

Thanks in advance.

450

asked Feb 20 '19 07:02

xabhi

2 Answers

"What are the general causes of these Timeout exceptions?"

The most common cause that I saw earlier was due to staled metadata information: one broker went down, and the topic partitions on that broker were failed over to other brokers. However, the topic metadata information has not been updated properly, and the client still tries to talk to the failed broker to either get metadata info, or to publish the message. That causes timeout exception.
Netwowrk connectivity issues. This can be easily diagnosed with telnet broker_host borker_port
The broker is overloaded. This can happen if the broker is saturated with high workload, or hosts too many topic partitions.

To handle the timeout exceptions, the general practice is:

Rule out broker side issues. make sure that the topic partitions are fully replicated, and the brokers are not overloaded
Fix host name resolution or network connectivity issues if there are any
Tune parameters such as request.timeout.ms, delivery.timeout.ms etc. My past experience was that the default value works fine in most of the cases.

171

answered Oct 14 '22 04:10

yuyang

The default Kafka config values, both for producers and brokers, are conservative enough that, under general circumstances, you shouldn't run into any timeouts. Those problems typically point to a flaky/lossy network between the producer and the brokers.

The exception you're getting, Failed to update metadata, usually means one of the brokers is not reachable by the producer, and the effect is that it cannot get the metadata.

For your second question, Kafka will automatically retry to send messages that were not fully ack'ed by the brokers. It's up to you if you want to catch and retry when you get a timeout on the application side, but if you're hitting 1+ min timeouts, retrying is probably not going to make much of a difference. You're going to have to figure out the underlying network/reachability problems with the brokers anyway.

In my experience, usually the network problems are:

Port 9092 is blocked by a firewall, either on the producer side or on the broker side, or somewhere in the middle (try nc -z broker-ip 9092 from the server running the producer)
DNS resolution is broken, so even though the port is open, the producer cannot resolve to an IP address.

answered Oct 14 '22 04:10

mjuarez

Related questions
                            
                                What is the best way to convert Integer[] to int[] [duplicate]
                            
                                How to use reflection with Mockito mock objects
                            
                                User class for Spring Security application
                            
                                What is the point of injecting a ViewModelFactory in Android - Dagger 2
                            
                                SparkLauncher. java.lang.NoSuchMethodError: org.yaml.snakeyaml.Yaml.<init>
                            
                                Connecting to multiple database in spring boot
                            
                                Spring WebFlux, unit testing Mono and Flux
                            
                                Java 8 streams filtering with priority
                            
                                Spring-boot + REST + HATEOAS + HAL
                            
                                Differences in behaviour of REQUIRES_NEW and NESTED propagation in Spring transactions
                            
                                Generate SHA512 Checksum File using maven-publish Plugin in gradle
                            
                                JAVA - can't import src/test/java to src/main/java
                            
                                Java 8 alternative for validating data inside multiple nested loops
                            
                                Java Stream sum() short circuiting [duplicate]
                            
                                JPA synchronization between micro-service instances
                            
                                Errorhandling in Javascript/Java application
                            
                                Calling flush() in @Transactional method in Spring Boot application
                            
                                Pairing numbers (a,b) in an array such a way that a*2 >=b
                            
                                Alternative to JmxReporter in dropwizard.metrics:metrics-core latest release
                            
                                MapStruct : mocking nested mapper

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With