Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

max.poll.records in conjunction with fetch.min.bytes

Tags:

apache-kafka

I am reading this How does max.poll.records affect the consumer poll, as well as apache kafka docs, and I am still not sure if fetch.min.bytes is not changed, and the default is 1, is kafka broker obligated to return max.poll.records of records, if that much is available, or not?

According to our tests, it does not always return that much even if there is sufficient data available in a topic, and the explanation of that parameter from documentation and its sheer name does not imply it should, but some people tend to think the opposite. We also increased the limits that could potentially prevent this from happen, like message.max.bytes, max.message.bytes, max.partition.fetch.bytes, and fetch.max.bytes (that one we actually didn't have to increase, since the default is rather high, 50 MB), but that didn't change a thing.

We also didn't change fetch.max.wait.ms, and default is 500, that is a half of a second, so, if fetch.min.bytes is not set to something more than 1 byte, then this setting becomes effective, ie, it determines how much records is actually returned? Which would mean that if less then max.poll.records was returned, it is because it would take more than 500 ms to fetch that much?

like image 250
hdjur_jcv Avatar asked Apr 29 '20 22:04

hdjur_jcv


People also ask

What is Max fetch bytes?

max. partition. fetch. bytes Sets a maximum limit in bytes on how much data is returned for each partition, which must always be larger than the number of bytes set in the broker or topic configuration for max. message.

What is default max poll interval MS?

Description. The default value of max.poll.interval.ms is 300000ms , which is 5 minutes , when it costs more than 5 minutes to consume one message, the machine would be kicked out of consumer group, which was not what I want.

What is fetch max wait MS?

fetch.max.wait.ms lets you control how long to wait. By default, Kafka will wait up to 500 ms. This results in up to 500 ms of extra latency in case there is not enough data flowing to the Kafka topic to satisfy the minimum amount of data to return.

What is poll timeout in Kafka?

poll. public ConsumerRecords<K,V> poll(long timeout) Fetch data for the topics or partitions specified using one of the subscribe/assign APIs. It is an error to not have subscribed to any topics or partitions before polling for data.


1 Answers

These 2 configurations can be confusing and while at first sight they look similar they work in very different ways.

  • fetch.min.bytes: This value is one of the fields of Fetch Requests (it's min_bytes in http://kafka.apache.org/protocol#The_Messages_Fetch). This value is used by the broker to decide when to send a Fetch Response back to the client. When a broker receives a Fetch Request it can hold it for up to fetch.max.wait.ms if there are not fetch.min.bytes bytes available for consumption (for example the consumer is at the end of the log or the messages to be consumed add to less than that size).

  • max.poll.records: This setting is only used within the Consumer and is never sent to brokers. In the background (asynchronously), the consumer client actively fetches records from the broker and buffers them so when poll() is called, it can return records already fetched. As the name suggest, this settings control how many records at most poll() can return from the consumer buffer.

like image 197
Mickael Maison Avatar answered Oct 22 '22 21:10

Mickael Maison