Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Zookeeper keeps getting the WARN: "caught end of stream exception"

I am now using a CDH-5.3.1 cluster with three zookeeper instances located in three ips:

133.0.127.40 n1
133.0.127.42 n2
133.0.127.44 n3

Everything works fine when it starts, but these days I notice that the node n2 keeps getting the WARN:

caught end of stream exception

EndOfStreamException: Unable to read additional data from client sessionid **0x0**, likely client has closed socket
    at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220)
    at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
    at java.lang.Thread.run(Thread.java:722)

it happens every second, and only on n2, while n1 and n3 are fine. I can still use HBase shell to scan my table, and the Solr WEB UI to do querys. But I cannot start Flume agents, the process all stops at this step:

Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog

jetty-6.1.26.cloudera.4

Started [email protected]:41414.

And minutes later I get the warning from Cloudera Manager that Flume agent is exceeding the threshold of File Descriptors.

Does anyone know what is going wrong? Thanks in advance.

like image 525
Baskwind Avatar asked Sep 27 '22 22:09

Baskwind


1 Answers

I recall seeing similar errors in ZK (admittedly not with Flume). I believe the problem at the time was to do with the large amount of data stored on the node and/or transferred to the client. Things to consider tweaking in zoo.cfg:

  • put a limit on autopurge.snapRetainCount, e.g. set it to 10
  • set autopurge.purgeInterval to, say, 2 (hours)

If the ZK client (Flume?) is streaming large znodes to/from the ZK cluster, you may want to set the Java system property jute.maxbuffer on the client JVM(s), and possibly on the server nodes, to a large enough value. I believe the default value for this property is 1M. Determining the appropriate value for your workload is an exercise in trial and error I'm afraid!

like image 65
Aeham Avatar answered Oct 19 '22 12:10

Aeham