Is there a way to get the maximal size of a certain protobuf message after it will be serialized? I'm referring to messages that don't contain "repeated" elements. Note that I'm not referring to the size of a protobuf message with a specific content, but to the maximum possible size that it can get to (in the worst case).

In general, any Protobuf message can be any length due to the possibility of unknown fields. If you are receiving a message, you cannot make any assumptions about the length. If you are sending a message that you built yourself, then you can perhaps assume that it only contains fields you know about -- but then again, you can also easily compute the exact message size in this case. Thus it's usually not useful to ask what the maximum size is. With that said, you could write code that uses the <code>Descriptor</code> interfaces to iterate over the <code>FieldDescriptor</code>s for a message type (<code>MyMessageType::descriptor()</code>). See: https://developers.google.com/protocol-buffers/docs/reference/cpp/google.protobuf.descriptor Similar interfaces exist in Java, Python, and probably others. Here's the rules to implement: Each field is composed of a tag followed by some data. For the tag: <ul> <li>Field numbers 1-15 have a 1-byte tag.</li> <li>Field numbers 16 and up have 2-byte tags.</li> </ul> For the data: <ul> <li> <code>bool</code> is always one byte.</li> <li> <code>int32</code>, <code>int64</code>, <code>uint64</code>, and <code>sint64</code> have a maximum data length of 10 bytes (yes, <code>int32</code> can be 10 bytes if it is negative, unfortunately).</li> <li> <code>sint32</code> and <code>uint32</code> have a maximum data length of 5 bytes.</li> <li> <code>fixed32</code>, <code>sfixed32</code>, and <code>float</code> are always exactly 4 bytes.</li> <li> <code>fixed64</code>, <code>sfixed64</code>, and <code>double</code> are always exactly 8 bytes.</li> <li>Enum-typed fields' maximum length depends on the maximum enum value: <ul> <li>0-127: 1 byte</li> <li>128-16384: 2 bytes</li> <li>... it's 7 bits per byte, but hopefully your enum isn't THAT big!</li> <li>Also note that negative values will be encoded as 10 bytes, but hopefully there aren't any.</li> </ul> </li> <li>Message-typed fields' maximum length is the maximum length of the message type plus bytes for the length prefix. The length prefix is, again, one byte per 7 bits of integer data.</li> <li>Groups (which you shouldn't be using; they're a decrepit old feature deprecated before protobuf was even released publicly) have a maximum size equal to the maximum size of the contents plus a second field tag (see above).</li> </ul> If your message contains any of the following, then its maximum length is unbounded: <ul> <li>Any field of type <code>string</code> or <code>bytes</code>. (Unless you know their max length, in which case, it's that max length plus a length prefix, like with sub-messages.)</li> <li>Any repeated field. (Unless you know its max length, in which case, each element of the list has a max length as if it were a free-standing field, including tag. There is NO overall length prefix here. Unless you are using <code>[packed=true]</code>, in which case you'll have to look up the details.)</li> <li>Extensions.</li> </ul>

As far as I know, there is no feature to calculate the maximum size in Google's own protobuf. Nanopb generator computes the maximum size when possible and exports it as a <code>#define</code> in the generated file. It is also quite simple to calculate manually for small messages, based on the protobuf encoding documentation.

Maximum serialized Protobuf message size

2 Answers

In general, any Protobuf message can be any length due to the possibility of unknown fields.

If you are receiving a message, you cannot make any assumptions about the length.

If you are sending a message that you built yourself, then you can perhaps assume that it only contains fields you know about -- but then again, you can also easily compute the exact message size in this case.

Thus it's usually not useful to ask what the maximum size is.

With that said, you could write code that uses the Descriptor interfaces to iterate over the FieldDescriptors for a message type (MyMessageType::descriptor()).

See: https://developers.google.com/protocol-buffers/docs/reference/cpp/google.protobuf.descriptor

Similar interfaces exist in Java, Python, and probably others.

Here's the rules to implement:

Each field is composed of a tag followed by some data.

For the tag:

Field numbers 1-15 have a 1-byte tag.
Field numbers 16 and up have 2-byte tags.

For the data:

bool is always one byte.
int32, int64, uint64, and sint64 have a maximum data length of 10 bytes (yes, int32 can be 10 bytes if it is negative, unfortunately).
sint32 and uint32 have a maximum data length of 5 bytes.
fixed32, sfixed32, and float are always exactly 4 bytes.
fixed64, sfixed64, and double are always exactly 8 bytes.
Enum-typed fields' maximum length depends on the maximum enum value:
- 0-127: 1 byte
- 128-16384: 2 bytes
- ... it's 7 bits per byte, but hopefully your enum isn't THAT big!
- Also note that negative values will be encoded as 10 bytes, but hopefully there aren't any.
Message-typed fields' maximum length is the maximum length of the message type plus bytes for the length prefix. The length prefix is, again, one byte per 7 bits of integer data.
Groups (which you shouldn't be using; they're a decrepit old feature deprecated before protobuf was even released publicly) have a maximum size equal to the maximum size of the contents plus a second field tag (see above).

If your message contains any of the following, then its maximum length is unbounded:

Any field of type string or bytes. (Unless you know their max length, in which case, it's that max length plus a length prefix, like with sub-messages.)
Any repeated field. (Unless you know its max length, in which case, each element of the list has a max length as if it were a free-standing field, including tag. There is NO overall length prefix here. Unless you are using [packed=true], in which case you'll have to look up the details.)
Extensions.

answered Oct 11 '22 00:10

Kenton Varda

As far as I know, there is no feature to calculate the maximum size in Google's own protobuf.

Nanopb generator computes the maximum size when possible and exports it as a #define in the generated file.

It is also quite simple to calculate manually for small messages, based on the protobuf encoding documentation.

answered Oct 11 '22 00:10

jpa

Related questions
                            
                                reserved keyword is used in protobuf in Python
                            
                                Convert json to dynamically generated protobuf in Java
                            
                                Accessing field of Protobuf message of unknown type in Python
                            
                                Organization of protobuf files in a microservice architecture
                            
                                Protobuf version conflicts with Qt
                            
                                Automatically generate Java from .proto with maven/m2e in Eclipse IDE
                            
                                Comments in textual serialized protobuf? (not the scheme definition)
                            
                                REST Java web service using protobuf [closed]
                            
                                Protocol Buffers: get byte array from ByteString without copying
                            
                                Difference between .pb and .pbtxt in tensorflow?
                            
                                Can .proto files' fields start at zero?
                            
                                How to set a ProtoBuf field which is an empty message in Python?
                            
                                Protobuf-Net error message: No Serializer defined for type: System.Type
                            
                                protobuf.net Unexpected subtype
                            
                                Flatbuffers vs CBOR
                            
                                Sorting the items in repeated field of a message in Google Protocol Buffers
                            
                                JSON to Protobuf in Python
                            
                                Can't compile example from google protocol buffers
                            
                                How to use protocol-buffers with autoconf/automake?
                            
                                Google Protocol Buffers (protobuf) in Python3 - trouble with ParseFromString (encoding?)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Maximum serialized Protobuf message size

Tags:

protocol-buffers

traveh

People also ask

2 Answers

Kenton Varda

jpa

Recent Activity

Donate For Us