Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Kafka Zookeeper Connection drop continuously

I have setup Kafka 3-node cluster and Zookeeper 3-node cluster, on separate nodes. Using Kafka I can produce and consume messages successfully and run commands like kafka-topic.sh to get topic lists and their informations from Zookeeper, but there are some errors on Kafka server.log file. The following warning appears continuously:

[2018-02-18 21:50:01,241] WARN Client session timed out, have not heard from server in 320190154ms for sessionid 0x161a94b101f0001 (org.apache.zookeeper.ClientCnxn)
[2018-02-18 21:50:01,242] INFO Client session timed out, have not heard from server in 320190154ms for sessionid 0x161a94b101f0001, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)
[2018-02-18 21:50:01,343] INFO zookeeper state changed (Disconnected) (org.I0Itec.zkclient.ZkClient)
[2018-02-18 21:50:01,989] INFO Opening socket connection to server zookeeper3/192.168.1.206:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn)
[2018-02-18 21:50:02,008] INFO Socket connection established to zookeeper3/192.168.1.206:2181, initiating session (org.apache.zookeeper.ClientCnxn)
[2018-02-18 21:50:02,042] INFO Session establishment complete on server zookeeper3/192.168.1.206:2181, sessionid = 0x161a94b101f0001, negotiated timeout = 6000 (org.apache.zookeeper.ClientCnxn)
[2018-02-18 21:50:02,042] INFO zookeeper state changed (SyncConnected) (org.I0Itec.zkclient.ZkClient)
[2018-02-18 21:59:31,570] INFO [Group Metadata Manager on Broker 102]: Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.group.GroupMetadataManager)

It seems the Kafka sessions in zookeeper expires periodically!

In Zookeeper logs are the following warninngs, too:

2018-02-18 18:20:06,149 [myid:1] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid 0x161a94b101f0001, likely client has closed socket
    at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:239)
    at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:203)
    at java.lang.Thread.run(Thread.java:748)
2018-02-18 18:20:06,151 [myid:1] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1044] - Closed socket connection for client /192.168.1.203:43162 which had sessionid 0x161a94b101f0001
2018-02-18 18:20:06,781 [myid:1] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid 0x161a94b101f0002, likely client has closed socket
    at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:239)
    at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:203)
    at java.lang.Thread.run(Thread.java:748)
2018-02-18 18:20:06,782 [myid:1] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1044] - Closed socket connection for client /192.168.1.201:45330 which had sessionid 0x161a94b101f0002
2018-02-18 18:37:29,127 [myid:1] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@192] - Accepted socket connection from /192.168.1.202:52480
2018-02-18 18:37:29,139 [myid:1] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@942] - Client attempting to establish new session at /192.168.1.202:52480
2018-02-18 18:37:29,143 [myid:1] - INFO  [CommitProcessor:1:ZooKeeperServer@687] - Established session 0x161a94b101f0003 with negotiated timeout 30000 for client /192.168.1.202:52480
2018-02-18 18:37:29,432 [myid:1] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1044] - Closed socket connection for client /192.168.1.202:52480 which had sessionid 0x161a94b101f0003

I think it's because zookeeper can't get heartbeat from Kafka nodes. The followings are Zookeeper zoo.cfg:

tickTime=2000
dataDir=/var/zookeeper/
clientPort=2181
initLimit=5
syncLimit=2
server.1=zookeeper1:2888:3888
server.2=zookeeper2:2888:3888
server.3=zookeeper3:2888:3888

and Kafka server.properties customized setting:

broker.id=1
listeners = PLAINTEXT://kafka1:9092
num.partitions=24
delete.topic.enable=true
default.replication.factor=2
log.dirs=/data/kafka/data
zookeeper.connect=zookeeper1:2181,zookeeper2:2181,zookeeper3:2181
log.retention.hours=168

I use the same zookeeper cluster for Hadoop HA without any problem. I think there is something wrong with the Kafka properties listeners and advertised.listeners. I read the Kafka documentation but couldn't understand their meaning.

In the host file of all OSes, hostnames such that zookeeper1 to zookeeper3 and kafka1 to kafka3 are defined and reachable through ping command. I removed the following lines from hosts:

127.0.0.1       localhost
127.0.1.1       hostname

I think it couldn't cause the problem.

  • Kafka version: 0.11
  • Zookeeper version: 3.4.10

Can anyone help?

like image 982
Soheil Pourbafrani Avatar asked Oct 23 '25 15:10

Soheil Pourbafrani


1 Answers

We were facing a similar issue with Kafka. As @Soheil pointed out it was due to a Major GC running.

When a Major GC runs, then Kafka would sometimes not be able to send heartbeat to zookeeper. For us the Major GC was running almost once every 15 sec. On taking a heap dump, we realized it was due to a Metric Memory Leak in Kafka.

like image 158
Kishore Bandi Avatar answered Oct 25 '25 22:10

Kishore Bandi



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!