I am reading this How does max.poll.records affect the consumer poll, as well as apache kafka docs, and I am still not sure if fetch.min.bytes
is not changed, and the default is 1, is kafka broker obligated to return max.poll.records
of records, if that much is available, or not?
According to our tests, it does not always return that much even if there is sufficient data available in a topic, and the explanation of that parameter from documentation and its sheer name does not imply it should, but some people tend to think the opposite. We also increased the limits that could potentially prevent this from happen, like message.max.bytes
, max.message.bytes
, max.partition.fetch.bytes
, and fetch.max.bytes
(that one we actually didn't have to increase, since the default is rather high, 50 MB), but that didn't change a thing.
We also didn't change fetch.max.wait.ms
, and default is 500, that is a half of a second, so, if fetch.min.bytes
is not set to something more than 1 byte, then this setting becomes effective, ie, it determines how much records is actually returned? Which would mean that if less then max.poll.records
was returned, it is because it would take more than 500 ms to fetch that much?
max. partition. fetch. bytes Sets a maximum limit in bytes on how much data is returned for each partition, which must always be larger than the number of bytes set in the broker or topic configuration for max. message.
Description. The default value of max.poll.interval.ms is 300000ms , which is 5 minutes , when it costs more than 5 minutes to consume one message, the machine would be kicked out of consumer group, which was not what I want.
fetch.max.wait.ms lets you control how long to wait. By default, Kafka will wait up to 500 ms. This results in up to 500 ms of extra latency in case there is not enough data flowing to the Kafka topic to satisfy the minimum amount of data to return.
poll. public ConsumerRecords<K,V> poll(long timeout) Fetch data for the topics or partitions specified using one of the subscribe/assign APIs. It is an error to not have subscribed to any topics or partitions before polling for data.
These 2 configurations can be confusing and while at first sight they look similar they work in very different ways.
fetch.min.bytes
: This value is one of the fields of Fetch Requests (it's min_bytes
in http://kafka.apache.org/protocol#The_Messages_Fetch). This value is used by the broker to decide when to send a Fetch Response back to the client. When a broker receives a Fetch Request it can hold it for up to fetch.max.wait.ms
if there are not fetch.min.bytes
bytes available for consumption (for example the consumer is at the end of the log or the messages to be consumed add to less than that size).
max.poll.records
: This setting is only used within the Consumer and is never sent to brokers. In the background (asynchronously), the consumer client actively fetches records from the broker and buffers them so when poll()
is called, it can return records already fetched. As the name suggest, this settings control how many records at most poll()
can return from the consumer buffer.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With