Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Improving performance of Kafka Producer

Tags:

apache-kafka

We're running on apache kafka 0.10.0.x and spring 3.x and cannot use spring kafka as it is supported with spring framework version 4.x.

Therefore, we are using the native Kafka Producer API to produce messages.

Now the concern that i have is the performance of my producer. The thing is i believe a call to producer.send is what really makes the connection to the Kafka broker and then puts the message onto the buffer and then attempts to send and then possibly calls your the provided callback method in the producer.send().

Now the KafkaProducer documentation says that it uses a buffer and another I/O thread to perform the send and that they should be closed appropriately so that there is no leakage of resources.

From what i understand, this means that if i have 100s of messages being sent every time i invoke producer.send() it attempts to connect to the broker which is an expensive I/O operation.

Can you please correct my understanding if i am wrong or maybe suggest a better to use the KafkaProducer?

like image 590
RookieDev Avatar asked Dec 11 '22 10:12

RookieDev


2 Answers

The two important configuration parameters of kafka producer are 'batch.size' and 'linger.ms'. So you basically have a choice: you can wait until the producer batch is full, or the producer time out.

  • batch.size – This is an upper limit of how many messages Kafka Producer will attempt to batch before sending – specified in bytes.

  • linger.ms – How long will the producer wait before sending in order to allow more messages to get accumulated in the same batch.

It depends on your use case, but I would suggest to take a closer look on these parameters.

like image 185
codejitsu Avatar answered Feb 12 '23 04:02

codejitsu


Your understanding is partially right.

As @leshkin pointed out there are configuration parameters to tune how the KafkaProducer will handle buffering of messages to be sent.

However independently from the buffering strategy, the producer will take care of caching established connections to topic-leader brokers.

Indeed you can tune for how long the producer will keep such connection around using the connections.max.idle.ms parameter (defaults to 9 minutes).

So to respond to your original question, the I/O cost of establishing a connection to the broker happens only on the first send invocation and will be amortised over time as long as you have data to send.

like image 25
nivox Avatar answered Feb 12 '23 04:02

nivox