I need to debug the data being exchanged between my kafka consumer and zookeeper using tcpdump. I went through the zookeeper documentation but could not find any write up about the zookeeper communication protocol i.e I get the following data dump using wireshark after removing headers. How do I interpret the data part?
Frame 1: 78 bytes on wire (624 bits), 78 bytes captured (624 bits)
Ethernet II, Src: 22:00:0a:xx:xx:xx (22:00:xx:xx:xx:xx), Dst: fe:ff:xx:xx:xx:xx (fe:ff:ff:xx:xx:xx)
Internet Protocol Version 4, Src: 10.234.xxx.xxx, Dst: 10.231.xxx.xxx
Transmission Control Protocol, Src Port: 51720 (51720), Dst Port: 2181 (2181), Seq: 1, Ack: 1, Len: 12
Data (12 bytes)
Data: 00000008fffffffe0000000b
[Length: 12]
ZooKeeper can be viewed as an atomic broadcast system, through which updates are totally ordered. The ZooKeeper Atomic Broadcast (ZAB) protocol is the core of the system.
Clients connect to a single ZooKeeper server. The client maintains a TCP connection through which it sends requests, gets responses, gets watch events, and sends heart beats. If the TCP connection to the server breaks, the client will connect to a different server.
Consistency Algorithm Although ZooKeeper provides similar functionality to the Paxos algorithm, the core consensus algorithm of ZooKeeper is not Paxos. The algorithm used in ZooKeeper is called ZAB, short for ZooKeeper Atomic Broadcast. Like Paxos, it relies on a quorum for durability.
ZooKeeper is an open source Apache project that provides a centralized service for providing configuration information, naming, synchronization and group services over large clusters in distributed systems. The goal is to make these systems easier to manage with improved, more reliable propagation of changes.
Sorry, but I'm not aware of any convenient documentation that describes the Apache ZooKeeper wire protocol in any great detail. Internally, our codebase is using a framework called Jute, which is based on code originally adapted from Apache Hadoop. The framework allows definition of structured records, generates code based on those definitions, and then provides serialization/deserialization routines called by the rest of the ZooKeeper code.
The Jute record definitions are visible here:
https://github.com/apache/zookeeper/blob/release-3.4.9/src/zookeeper.jute
The Jute framework code for handling these record definitions is visible here:
https://github.com/apache/zookeeper/tree/release-3.4.9/src/java/main/org/apache/jute
I think the only option for a deep understanding of the wire protocol would be to dig into this code.
After digging through a few layers of raw socket handling code (which uses either NIO or Netty depending on configuration), the real work of deserializing the payload happens in ZooKeeperServer#processPacket(ServerCnxn, ByteBuffer)
:
https://github.com/apache/zookeeper/blob/release-3.4.9/src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java#L941
This is where it deserializes a RequestHeader
, which is a common header of metadata at the front of all of the protocol's messages. The definition of RequestHeader
is shown here:
https://github.com/apache/zookeeper/blob/release-3.4.9/src/zookeeper.jute#L88-L91
We can see it consists of 2 4-byte integer fields: a connection ID followed by the type of the message. The type values are defined in ZooDefs
here:
https://github.com/apache/zookeeper/blob/release-3.4.9/src/java/main/org/apache/zookeeper/ZooDefs.java#L28
Knowing all of this, let's go back to your packet capture and try to make sense of it:
Data: 00000008fffffffe0000000b
00000008 - payload length
fffffffe - connection ID
0000000b - op code ("ping")
At the front of each message (even before the RequestHeader
), there is the length of the payload. Here we see a length of 8 bytes.
The next 4 bytes are the connection ID, fffffffe
.
The final 4 bytes are the op code, 0000000b
(or 11 in decimal). Reading ZooDefs
, we can see that this is the "ping" operation. The "ping" operation is used for periodic heartbeats between client and server. There is no additional data required in the payload for the "ping" operation, so this is the end of this packet, and there is no additional data after it. For different operations, there would be additional data in the payload, representing the arguments to the operation.
I hope this helps.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With