Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Connection management when using kafka producer in high traffic environment

I am going to use kafka in a very high traffic environment of more than a billion requests per day. Every request will make a connection to kafka cluster to send message. So there will be so many connections being made continuously every second. This could cause issues like socket timeouts. producer is making all the non-persistent connections. So in such case there could be socket timeout or port exhaustion issues.

Most ecosystem is in php, so I have to use php library for kafka. Now how to effectively use kafka producer to mitigate this connection contention?

I thought of a daemon process which can be fed messages and it will then send these messages in batch to kafka cluster. Plus side is that there can be limited number of connections. Down side is that response latency of such service will hamper the application. Also I have to use some intermediate storage to hold messages.

Now I know that there are many extremely high volume applications/sites using kafka to directly stream the messages. Can any one of you guide me about how to tackle these issues? Can persistent connections help in this case? Is using php library of kafka-producer in such high volume environment itself is a bad idea?

like image 468
Shades88 Avatar asked May 07 '15 10:05

Shades88


People also ask

How does Kafka producer handle large messages?

We can store the large messages in a file at the shared storage location and send the location through Kafka message. This can be a faster option and has minimum processing overhead. Another option could be to split the large message into small messages of size 1KB each at the producer end.

Can Kafka producer and consumer be on different servers?

Yes, if you want to have your producer on Server A and your consumer on server B, you are in the right direction. You need to run a Broker on server A to make it work.


1 Answers

We also uses kafka java library and we do that like a @apatel says, I think that in your situation you could try to provide some sidecar to your servers with php app, sidecar will create Producer at start and Kafka java driver will manage multiple connections. Here is some interesting article about Netflix's sidecar application Netflix Prana

like image 176
Paweł Szymczyk Avatar answered Sep 17 '22 18:09

Paweł Szymczyk