Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What are the delimiters for protobuf messages?

What are the delimiters for protobuf messages? I'm working with serialized messages. I would like to know if the messages begins with $$__$$ and ends with the same sign.

like image 926
Marko Bencik Avatar asked Sep 14 '18 17:09

Marko Bencik


People also ask

What are protobuf messages?

Protocol Buffers (Protobuf) is a free and open-source cross-platform data format used to serialize structured data. It is useful in developing programs to communicate with each other over a network or for storing data.

What is the content type for protobuf?

The Content-Type representation header is used to indicate the original media type of the resource (prior to any content encoding applied for sending). Meanwhile, protobuf is serialization/de-serialization schema/library.


2 Answers

For top level messages (i.e. separate calls to serialize): there literally isn't one. Unless you add your own framing, messages actively bleed into each-other, as the deserializer will (by default) just read to the end of a stream. So: if you have blindly concatenated multiple objects without your own framing protocol: you now have problems.

For the internals of messages, there are two ways of encoding sub-objects - length prefix and groups. Groups are largely deprecated, and the encoding of sub-objects is ambiguous in that it is also the same markers that describe strings, blobs (bytes), and "packed arrays". You probably don't want to try to handle that.

So: it sounds like you need to add your own framing protocol, in which case the answer will be : whatever your framing protocol defines. Just remember that protobuf is binary, so you cannot rely on any byte sequence as a sentinel / terminator. You should ideally use a length prefix approach instead.

like image 167
Marc Gravell Avatar answered Sep 24 '22 20:09

Marc Gravell


(In addition to existing answers 1, 2)

Common framing method for protocol buffers is to prepend a varint before actual protobuf message.

The implementation is already part of the protobuf library, e.g.:

  • for java: MessageLite.writeDelimitedTo(), Parser.parseDelimitedFrom()

  • for C: methods in header google/protobuf/util/delimited_message_util.h (e.g. SerializeDelimitedToFileDescriptor())

Good luck with your project!

EDIT> The official reference states that:

If you want to write multiple messages to a single file or stream, it is up to you to keep track of where one message ends and the next begins. The Protocol Buffer wire format is not self-delimiting, so protocol buffer parsers cannot determine where a message ends on their own. The easiest way to solve this problem is to write the size of each message before you write the message itself. When you read the messages back in, you read the size, then read the bytes into a separate buffer, then parse from that buffer. (If you want to avoid copying bytes to a separate buffer, check out the CodedInputStream class (in both C++ and Java) which can be told to limit reads to a certain number of bytes.)

like image 42
vlp Avatar answered Sep 21 '22 20:09

vlp