I am working with Kafka and trying to consume data from it. From the below line, I can poll the data from Kafka.
while (true) {
ConsumerRecords<byte[], <byte[]> records = consumer.poll(Long.MAX_VALUE);
for (ConsumerRecord<byte[], <byte[]> record : records) {
// retrieve data
}
}
My question is what is the benefit I am getting by providing Long.MAX_VALUE
as the timeout as compared to if I provide 200
as the timeout. What is the best practice for the system that will be running production.
Can anyone explain me the difference of high timeout vs low timeout and which should be use in production system?
The poll() method is the function a Kafka consumer calls to retrieve records from a given topic. When calling the poll() method, consumers provide a timeout argument. This is the maximum amount of time to wait for records to process before returning.
See this answer for more details. max.poll.interval.ms default value is five minutes, so if your consumerRecords. forEach takes longer than that your consumer will be considered dead.
The default timeout is 1 minute, to change it, open the Kafka Client Configuration > Producer tab > Advance Properties > add max.block.ms and set to desired value (in milliseconds).
Kafka Consumer Poll Thread poll() calls are separated by more than max.poll.interval.ms time, then the consumer will be disconnected from the group. This controls the maximum number of records that a single call to poll() will return.
Setting MAX_VALUE is sort of a synchronous message consuming, waiting forever until we got something returned back from the poll, while setting to a lower value gives you a chance that you can decide to do something else other than awaiting. Which should be used depends on your actual scenario.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With