Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When does the Apache Kafka client throw a "Batch Expired" exception?

Using the Apache Kafka Java client (0.9), I'm trying to send a long series of records to the broker using the Kafka Producer class.

The asynchronous send method returns immediately for a while, then starts blocking on each call for a short time period. After around thirty seconds, the client starts throwing exceptions (TimeoutException), with the message "Batch expired".

What circumstances cause this exception to be thrown?

like image 584
James Thomas Avatar asked Jan 14 '16 16:01

James Thomas


People also ask

How does Kafka handle timeout exception?

To handle the timeout exceptions, the general practice is: Rule out broker side issues. make sure that the topic partitions are fully replicated, and the brokers are not overloaded. Fix host name resolution or network connectivity issues if there are any.

What is Kafka consumer timeout?

The timeout used to detect client failures when using Kafka's group management facility. The client sends periodic heartbeats to indicate its liveness to the broker.

What is default session timeout MS in Kafka?

3000 is the default value and shouldn't be changed. Start with 30000, increase if seeing frequent rebalancing because of missed heartbeats. Make sure that your request.timeout.ms is at least the recommended value of 60000 and your session.timeout.ms is at least the recommended value of 30000.

How does Kafka batch work?

Batching messages enables a Kafka producer to increase its throughput. Reducing the number of network requests the producer makes in order to send data will improve the performance of the system. The cost of increased throughput is increased latency.


2 Answers

This exception indicates you are queueing records at a faster rate than they can be sent.

When you call the send method, the ProducerRecord will be stored in an internal buffer for sending to the broker. The method returns immediately once the ProducerRecord has been buffered, regardless of whether it has been sent.

Records are grouped into batches for sending to the broker, to reduce the transport overheard per message and increase throughput.

Once a record is added a batch, there is a time limit for sending that batch to ensure it has been sent within a specified duration. This is controlled by the Producer configuration parameter, request.timeout.ms, which defaults to thirty seconds.

If the batch has been queued longer than the timeout limit, the exception will be thrown. Records in that batch will be removed from the send queue.

Increasing the timeout limit, using the configuration parameter, will allow the client to queue batches for longer before expiring.

like image 196
James Thomas Avatar answered Oct 18 '22 20:10

James Thomas


I got this exception in a completely different context.

I have setup a mini cluster of a zookeeper vm, a broker vm and a producer/consumer vm. I opened all neccessary ports on the server (9092) and on the zookeeper (2181) and then tried to publish a message from the consumer/publisher vm to the broker. I got the exception mentioned by the OP, but since I had only published one single message so far (or at least I tried to), the solution couldn't be to increase the timeout or batch size. So I searched on and found this mailing list describing a similar problem I had when trying to consume messages from within the consumer/producer vm (ClosedChannelException): http://grokbase.com/t/kafka/users/152jsjekrm/having-trouble-with-the-simplest-remote-kafka-config The last post in this mailing list actually describes how to solve the problem.

Long story short, if you face both the ChannelClosedException and the Batch Expired exception, you likely have to change this line to the following in the server.config file and restart the broker:

advertised.host.name=<broker public IP address>

If it isn't set, it falls back to the host.name property (which probably isn't set neither) and then falls back to the canonical host name of the InetAddress Java class, which finally isn't correct of course and thus confusing remote nodes.

like image 32
Roberto Avatar answered Oct 18 '22 20:10

Roberto