Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Google Protocol Buffers - Fixed size buffer?

Using Google Protocol Buffers, can I set a maximum size for all messages I encode?

if I know that what I encode is never larger than X bytes, then Google Protobuffs would always produce a buffer of size Y, and if I give it a smaller amount of data, pad it to size Y?

like image 876
Roey Avatar asked May 12 '10 06:05

Roey


People also ask

What are Google protocol buffers used for?

What are protocol buffers? Protocol buffers are Google's language-neutral, platform-neutral, extensible mechanism for serializing structured data – think XML, but smaller, faster, and simpler.

How big can a protobuf message be?

Protobuf has a hard limit of 2GB, because many implementations use 32-bit signed arithmetic. For security reasons, many implementations (especially the Google-provided ones) impose a size limit of 64MB by default, although you can increase this limit manually if you need to.

What is the difference between proto2 and Proto3?

Proto3 is the latest version of Protocol Buffers and includes the following changes from proto2: Field presence, also known as hasField , is removed by default for primitive fields. An unset primitive field has a language-defined default value.

Is protobuf an IDL?

The Protobuf IDL is also designed for humans to both read and write, whereas WSDL is intended as a machine-readable/writable format. Changing the WSDL of a WCF service typically requires changing the service, running the service, and regenerating the WSDL file from the server.


1 Answers

The wire format for protocol buffers wouldn't make this trivial; I'm not aware of something to do this, but one option would be to serialize it into a buffer with your own length header and pad with extra data as needed.

You need to add a length prefix because this is not added by default, and otherwise it would be reading garbage at the end of your buffer. Even trailing 0s would not be legal (it would be looking for a field number).

I can't comment on the C++ or Jon's C# version, but for my C# version (protobuf-net), you should be able to do something like (untested):

using(var  ms = new MemoryStream(fixedLength)) {
     ms.SetLength(fixedLength);
     Serializer.SerializeWithLengthPrefix(ms, obj);
     if(ms.Length > fixedLength) { /* boom */ }
     byte[] arr = ms.ToArray(); // use this
}

This should deserialize fine if also using DeserializeWithLengthPrefix.


Re the questions (comments); SerializeWithLengthPrefix is a protobuf-net-specific method; there may be something in the C++ version, but it is pretty simple. The easiest way to implement this from scratch is:

  • assume we will leave a fixed-length (4 byte) header to indicate how much actual data we have
  • skip 4 bytes (or write 00-00-00-00)
  • now serialize to the rest of the buffer
  • find how many bytes you just wrote
  • write that value back at the start of the buffer

in reverse, obviously:

  • read 4 bytes and interpret as an int
  • deserialize that much as data

It is a little bit more complex in protobuf-net, as it offers a few more options (how the int should be encoded, and whether or not to wrap this so that the entire thing can still be treated as a 100% value protobuf stream - in particular I suspect I've just described the behaviour if I asked SerializeWithLengthPrefix to use fixed-width encoding and "field 0").

like image 162
Marc Gravell Avatar answered Oct 20 '22 11:10

Marc Gravell