If a C++ program receives a Protocol Buffers message that has a Protocol Buffers string
field, which is represented by a std::string
, what is the encoding of text in that field? Is it UTF-8?
Protobuf strings are always valid UTF-8 strings. See the Language Guide: A string must always contain UTF-8 encoded or 7-bit ASCII text. (And ASCII is always also valid UTF-8.)
No it does not; there is no "compression" as such specified in the protobuf spec; however, it does (by default) use "varint encoding" - a variable-length encoding for integer data that means small values use less space; so 0-127 take 1 byte plus the header.
Since protobuf uses tags to identify field number of a field, there is no point in relying on the order of encoded values. If it is an array of primitive numeric types (integer, float, double), then they are encoded within single key-value (tag + length of bytes + encoded bytes) pair.
Protobuf is a binary format, so working with it becomes tedious.
Protobuf strings are always valid UTF-8
strings.
See the Language Guide:
A string must always contain UTF-8 encoded or 7-bit ASCII text.
(And ASCII is always also valid UTF-8.)
Not all protobuf implementations enforce this, but if I recall correctly, at least the Python library refuses to decode non-unicode strings.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With