Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Maximum serialized Protobuf message size

Is there a way to get the maximal size of a certain protobuf message after it will be serialized?

I'm referring to messages that don't contain "repeated" elements.

Note that I'm not referring to the size of a protobuf message with a specific content, but to the maximum possible size that it can get to (in the worst case).

like image 855
traveh Avatar asked Jun 18 '15 12:06

traveh


People also ask

How is protobuf serialized?

The Protobuf serialization mechanism is given through the protoc application, this compiler will parse the . proto file and will generate as output, source files according to the configured language by its arguments, in this case, C++. You can also obtain more information about, reading the section compiler invocation.

Does protobuf compress data?

No it does not; there is no "compression" as such specified in the protobuf spec; however, it does (by default) use "varint encoding" - a variable-length encoding for integer data that means small values use less space; so 0-127 take 1 byte plus the header.

Is protobuf faster than JSON?

Benchmark — telemetry data We copied the proto files and data to the benchmark, and got the following results: These were the results we expected — for this data, protobuf is actually slower than JSON.

How efficient is protobuf?

When using Protobuf on a non-compressed environment, the requests took 78% less time than the JSON requests. This shows that the binary format performed almost 5 times faster than the text format. And, when issuing these requests on a compressed environment, the difference was even bigger.


2 Answers

In general, any Protobuf message can be any length due to the possibility of unknown fields.

If you are receiving a message, you cannot make any assumptions about the length.

If you are sending a message that you built yourself, then you can perhaps assume that it only contains fields you know about -- but then again, you can also easily compute the exact message size in this case.

Thus it's usually not useful to ask what the maximum size is.

With that said, you could write code that uses the Descriptor interfaces to iterate over the FieldDescriptors for a message type (MyMessageType::descriptor()).

See: https://developers.google.com/protocol-buffers/docs/reference/cpp/google.protobuf.descriptor

Similar interfaces exist in Java, Python, and probably others.

Here's the rules to implement:

Each field is composed of a tag followed by some data.

For the tag:

  • Field numbers 1-15 have a 1-byte tag.
  • Field numbers 16 and up have 2-byte tags.

For the data:

  • bool is always one byte.
  • int32, int64, uint64, and sint64 have a maximum data length of 10 bytes (yes, int32 can be 10 bytes if it is negative, unfortunately).
  • sint32 and uint32 have a maximum data length of 5 bytes.
  • fixed32, sfixed32, and float are always exactly 4 bytes.
  • fixed64, sfixed64, and double are always exactly 8 bytes.
  • Enum-typed fields' maximum length depends on the maximum enum value:
    • 0-127: 1 byte
    • 128-16384: 2 bytes
    • ... it's 7 bits per byte, but hopefully your enum isn't THAT big!
    • Also note that negative values will be encoded as 10 bytes, but hopefully there aren't any.
  • Message-typed fields' maximum length is the maximum length of the message type plus bytes for the length prefix. The length prefix is, again, one byte per 7 bits of integer data.
  • Groups (which you shouldn't be using; they're a decrepit old feature deprecated before protobuf was even released publicly) have a maximum size equal to the maximum size of the contents plus a second field tag (see above).

If your message contains any of the following, then its maximum length is unbounded:

  • Any field of type string or bytes. (Unless you know their max length, in which case, it's that max length plus a length prefix, like with sub-messages.)
  • Any repeated field. (Unless you know its max length, in which case, each element of the list has a max length as if it were a free-standing field, including tag. There is NO overall length prefix here. Unless you are using [packed=true], in which case you'll have to look up the details.)
  • Extensions.
like image 94
Kenton Varda Avatar answered Oct 11 '22 00:10

Kenton Varda


As far as I know, there is no feature to calculate the maximum size in Google's own protobuf.

Nanopb generator computes the maximum size when possible and exports it as a #define in the generated file.

It is also quite simple to calculate manually for small messages, based on the protobuf encoding documentation.

like image 20
jpa Avatar answered Oct 11 '22 00:10

jpa