I am reading this How does max.poll.records affect the consumer poll, as well as apache kafka docs, and I am still not sure if <code>fetch.min.bytes</code> is not changed, and the default is 1, is kafka broker obligated to return <code>max.poll.records</code> of records, if that much is available, or not? According to our tests, it does not always return that much even if there is sufficient data available in a topic, and the explanation of that parameter from documentation and its sheer name does not imply it should, but some people tend to think the opposite. We also increased the limits that could potentially prevent this from happen, like <code>message.max.bytes</code>, <code>max.message.bytes</code>, <code>max.partition.fetch.bytes</code>, and <code>fetch.max.bytes</code> (that one we actually didn't have to increase, since the default is rather high, 50 MB), but that didn't change a thing. We also didn't change <code>fetch.max.wait.ms</code>, and default is 500, that is a half of a second, so, if <code>fetch.min.bytes</code> is not set to something more than 1 byte, then this setting becomes effective, ie, it determines how much records is actually returned? Which would mean that if less then <code>max.poll.records</code> was returned, it is because it would take more than 500 ms to fetch that much?

These 2 configurations can be confusing and while at first sight they look similar they work in very different ways. <ul> <li><code>fetch.min.bytes</code>: This value is one of the fields of Fetch Requests (it's <code>min_bytes</code> in http://kafka.apache.org/protocol#The_Messages_Fetch). This value is used by the broker to decide when to send a Fetch Response back to the client. When a broker receives a Fetch Request it can hold it for up to <code>fetch.max.wait.ms</code> if there are not <code>fetch.min.bytes</code> bytes available for consumption (for example the consumer is at the end of the log or the messages to be consumed add to less than that size).</li> <li><code>max.poll.records</code>: This setting is only used within the Consumer and is never sent to brokers. In the background (asynchronously), the consumer client actively fetches records from the broker and buffers them so when <code>poll()</code> is called, it can return records already fetched. As the name suggest, this settings control how many records at most <code>poll()</code> can return from the consumer buffer.</li> </ul>

max.poll.records in conjunction with fetch.min.bytes

Tags:

apache-kafka

I am reading this How does max.poll.records affect the consumer poll, as well as apache kafka docs, and I am still not sure if fetch.min.bytes is not changed, and the default is 1, is kafka broker obligated to return max.poll.records of records, if that much is available, or not?

According to our tests, it does not always return that much even if there is sufficient data available in a topic, and the explanation of that parameter from documentation and its sheer name does not imply it should, but some people tend to think the opposite. We also increased the limits that could potentially prevent this from happen, like message.max.bytes, max.message.bytes, max.partition.fetch.bytes, and fetch.max.bytes (that one we actually didn't have to increase, since the default is rather high, 50 MB), but that didn't change a thing.

We also didn't change fetch.max.wait.ms, and default is 500, that is a half of a second, so, if fetch.min.bytes is not set to something more than 1 byte, then this setting becomes effective, ie, it determines how much records is actually returned? Which would mean that if less then max.poll.records was returned, it is because it would take more than 500 ms to fetch that much?

250

asked Apr 29 '20 22:04

hdjur_jcv

1 Answers

These 2 configurations can be confusing and while at first sight they look similar they work in very different ways.

fetch.min.bytes: This value is one of the fields of Fetch Requests (it's min_bytes in http://kafka.apache.org/protocol#The_Messages_Fetch). This value is used by the broker to decide when to send a Fetch Response back to the client. When a broker receives a Fetch Request it can hold it for up to fetch.max.wait.ms if there are not fetch.min.bytes bytes available for consumption (for example the consumer is at the end of the log or the messages to be consumed add to less than that size).
max.poll.records: This setting is only used within the Consumer and is never sent to brokers. In the background (asynchronously), the consumer client actively fetches records from the broker and buffers them so when poll() is called, it can return records already fetched. As the name suggest, this settings control how many records at most poll() can return from the consumer buffer.

197

answered Oct 22 '22 21:10

Mickael Maison

Related questions
                            
                                Not able to create kafka topic using docker-compose
                            
                                Kafka-topics --list using ssl
                            
                                How to create a new consumer group in kafka
                            
                                kafka s3 sink connector crashed when It gets NULL data
                            
                                Apache Kafka Java Classes?
                            
                                Why can't Kafka Producer connect to zookeeper to fetch broker metadata instead of connecting to brokers
                            
                                unable to set 'max.poll.records' under kafka consumer, where cons.poll still returns all records under partition
                            
                                How to delete Kafka topic using Kafka REST Proxy?
                            
                                How customer offsets are maintained in mirrored cluster in Kafka?
                            
                                How to pass multiple bootstrap servers for listener using spring-kafka
                            
                                How to Process a kafka KStream and write to database directly instead of sending it another topic
                            
                                Read json from Kafka and write json to other Kafka topic
                            
                                Use of producer.properties and consumer.properties file in Apache Kafka
                            
                                Event sourcing with Kafka streams
                            
                                Spring Kafka Always rebalance after 5 min even i pause consumer
                            
                                How does max.poll.records affect the consumer poll
                            
                                how to compress data in producers when using spring kafka
                            
                                how to pause and resume @KafkaListener using spring-kafka
                            
                                Pyspark Failed to find data source: kafka
                            
                                What is the maximum replication factor for a partition of kafka topic

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With