Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Text encoding of Protocol Buffers string fields

If a C++ program receives a Protocol Buffers message that has a Protocol Buffers string field, which is represented by a std::string, what is the encoding of text in that field? Is it UTF-8?

like image 773
Raedwald Avatar asked Sep 18 '18 10:09

Raedwald


People also ask

What encoding is protobuf?

Protobuf strings are always valid UTF-8 strings. See the Language Guide: A string must always contain UTF-8 encoded or 7-bit ASCII text. (And ASCII is always also valid UTF-8.)

Does protobuf compress strings?

No it does not; there is no "compression" as such specified in the protobuf spec; however, it does (by default) use "varint encoding" - a variable-length encoding for integer data that means small values use less space; so 0-127 take 1 byte plus the header.

How does protobuf encoding work?

Since protobuf uses tags to identify field number of a field, there is no point in relying on the order of encoded values. If it is an array of primitive numeric types (integer, float, double), then they are encoded within single key-value (tag + length of bytes + encoded bytes) pair.

Is protobuf text or binary?

Protobuf is a binary format, so working with it becomes tedious.


1 Answers

Protobuf strings are always valid UTF-8 strings.

See the Language Guide:

A string must always contain UTF-8 encoded or 7-bit ASCII text.

(And ASCII is always also valid UTF-8.)

Not all protobuf implementations enforce this, but if I recall correctly, at least the Python library refuses to decode non-unicode strings.

like image 135
jpa Avatar answered Sep 29 '22 05:09

jpa