Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How are TCP Connections managed by kafka-clients scala library?

I am using kafka-clients library for integrating Kafka with a Scala application. And finding it difficult to understand, how and when TCP connections are made between Brokers and Producers/Consumers.

Please verify my understanding on the below points-

(1) No TCP connection is established on initialisation of KafkaProducer instance.

val producer = new KafkaProducer[String, String](properties)

This also holds true for KafkaConsumer.

val consumer = new KafkaConsumer[String, String](properties)

(2) First TCP connection (between Broker and Producer) is established on producing a record to Broker.

producer.send(record1)

(3) Subsequent send() calls from the same Producer to same Broker will share same TCP connection irrespective of the Topic.

producer.send(record2)

(4) In case of Consumer, first TCP connection is established on polling a Topic (not on Subscription).

val records = consumer.poll(timeout)

(5) Subsequent calls to poll by the same Consumer to the same Broker share the same connection.

like image 334
Dollyg Avatar asked Dec 11 '22 08:12

Dollyg


2 Answers

No TCP connection is established on initialisation of KafkaProducer instance.

Not exactly. KafkaProducer initialisation will start the Sender thread from within multiple TCP connections to all the bootstrap servers will be established. Those Sockets will be used to retrieve metadata from the cluster.

First TCP connection (between Broker and Producer) is established on producing a record to Broker.

Almost correct. Actually client always creates multiple TCP connections to the brokers. This is even true when you have one broker. For producer, it often creates two connections, one of which is for updating Metadata and the other is for sending messages. For consumer(assume you are using consumer group), seems it will create 3 connections. One for finding coordinator; one for group management(including join/sync groups and offset things); one for retrieving offsets and the last for pulling messages.
UPDATE: consumer creates 3 connections instead of 4 which I previously claimed. THANKS @ppatierno FOR THE REMINDING.

Subsequent send() calls from the same Producer to same Broker will share same TCP connection irrespective of the Topic.

Subsequent send calls reuse the second connection producer creates.

In case of Consumer, first TCP connection is established on polling a Topic (not on Subscription).

Yes, all connections are created in the poll call.

Subsequent calls to poll by the same Consumer to the same Broker share the same connection.

Subsequent calls to poll reuse the last connection consumer creates.

like image 85
amethystic Avatar answered Mar 08 '23 22:03

amethystic


Subsequent send() calls from the same Producer to same Broker will share same TCP connection irrespective of the Topic.

Just to add (to the great answer by @amethystic) that if the producer tries to send to a new topic and the broker to which it's connected isn't the leader, the producer needs to fetch metadata about that topic and opening a new connection to the broker which is leader for that topic. So saying "share same TCP connection irrespective of the Topic" is not completely correct.

like image 41
ppatierno Avatar answered Mar 08 '23 23:03

ppatierno