I am now using a CDH-5.3.1 cluster with three zookeeper instances located in three ips:
133.0.127.40 n1
133.0.127.42 n2
133.0.127.44 n3
Everything works fine when it starts, but these days I notice that the node n2 keeps getting the WARN:
caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid **0x0**, likely client has closed socket
at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220)
at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
at java.lang.Thread.run(Thread.java:722)
it happens every second, and only on n2, while n1 and n3 are fine. I can still use HBase shell to scan my table, and the Solr WEB UI to do querys. But I cannot start Flume agents, the process all stops at this step:
Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
jetty-6.1.26.cloudera.4
Started [email protected]:41414.
And minutes later I get the warning from Cloudera Manager that Flume agent is exceeding the threshold of File Descriptors.
Does anyone know what is going wrong? Thanks in advance.
I recall seeing similar errors in ZK (admittedly not with Flume). I believe the problem at the time was to do with the large amount of data stored on the node and/or transferred to the client. Things to consider tweaking in zoo.cfg:
autopurge.snapRetainCount
, e.g. set it to 10autopurge.purgeInterval
to, say, 2 (hours)If the ZK client (Flume?) is streaming large znodes to/from the ZK cluster, you may want to set the Java system property jute.maxbuffer
on the client JVM(s), and possibly on the server nodes, to a large enough value. I believe the default value for this property is 1M. Determining the appropriate value for your workload is an exercise in trial and error I'm afraid!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With