How to recover from the following error that started happening after a server crash? Zookeeper won’t start and the following message is showing repeatedly on the log.
2017-05-27 01:02:08,072 [myid:] - INFO [main:Environment@100] - Server environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
2017-05-27 01:02:08,072 [myid:] - INFO [main:Environment@100] - Server environment:java.io.tmpdir=/tmp
2017-05-27 01:02:08,072 [myid:] - INFO [main:Environment@100] - Server environment:java.compiler=<NA>
2017-05-27 01:02:08,072 [myid:] - INFO [main:Environment@100] - Server environment:os.name=Linux
2017-05-27 01:02:08,072 [myid:] - INFO [main:Environment@100] - Server environment:os.arch=amd64
2017-05-27 01:02:08,073 [myid:] - INFO [main:Environment@100] - Server environment:os.version=3.10.0-514.16.1.el7.x86_64
2017-05-27 01:02:08,073 [myid:] - INFO [main:Environment@100] - Server environment:user.name=zookeeper
2017-05-27 01:02:08,073 [myid:] - INFO [main:Environment@100] - Server environment:user.home=/opt/zookeeper
2017-05-27 01:02:08,073 [myid:] - INFO [main:Environment@100] - Server environment:user.dir=/
2017-05-27 01:02:08,074 [myid:] - INFO [main:ZooKeeperServer@829] - tickTime set to 2000
2017-05-27 01:02:08,074 [myid:] - INFO [main:ZooKeeperServer@838] - minSessionTimeout set to -1
2017-05-27 01:02:08,074 [myid:] - INFO [main:ZooKeeperServer@847] - maxSessionTimeout set to -1
2017-05-27 01:02:08,080 [myid:] - INFO [main:NIOServerCnxnFactory@89] - binding to port 0.0.0.0/0.0.0.0:2181
2017-05-27 01:02:08,385 [myid:] - ERROR [main:Util@239] - Last transaction was partial.
2017-05-27 01:02:08,400 [myid:] - ERROR [main:Util@239] - Last transaction was partial.
2017-05-27 01:02:08,403 [myid:] - ERROR [main:Util@239] - Last transaction was partial.
2017-05-27 01:02:08,403 [myid:] - ERROR [main:Util@239] - Last transaction was partial.
2017-05-27 01:02:08,404 [myid:] - ERROR [main:Util@239] - Last transaction was partial.
2017-05-27 01:02:08,404 [myid:] - ERROR [main:ZooKeeperServerMain@64] - Unexpected exception, exiting abnormally
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
at org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:585)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:604)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:570)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:652)
at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:166)
at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
at org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:283)
at org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:410)
at org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:118)
at org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:119)
at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:87)
at org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:53)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
thanks IPVP
The solution for me was to find the 0 length log file in /hadoop/zookeeper/version-2 (or whichever place your dataDir is) and delete it. Start ZooKeeper afterwards.
The solution for me was to find the last log file (which had 0 byte length)
You will find this inside the version-2
directory
ls -l -r --sort=time
-rw-r--r-- 1 chris chris 67108880 Jan 24 10:37 log.23c6a70
-rw-r--r-- 1 chris chris 0 Jan 24 10:37 log.23d3fb4
I've tried first to delete the snapshot and the last 2 logfiles which is also working but then you would have version which is "a bit" older.
-rw-r--r-- 1 chris chris 3685904 Jan 24 00:56 snapshot.23c6a6e
Maybe you have to delete the last snapshot file and the last logfile together and the 0 length logfile to be safe.
btw. Logfile and snapshot have the same HEX pattern which have to match
log.23c6a70
snapshot.23c6a6e
They have to match and be consistent and you should have this problem fixed.
It looks like you have encountered a known Apache ZooKeeper bug. There are a few different Apache JIRA issues related to this: ZOOKEEPER-1621 and ZOOKEEPER-2332. See the comments in those issues if you're interested in root cause analysis and some potential proposed fixes.
Unfortunately, there is no Apache ZooKeeper release that contains a fix for the bug at this time. There are a few potential workarounds that you could try:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With