Some protobuf messages, when serialized to string, have new line character \n inside them. Usually when the first field of the message is a string then the new line character is prepended before the message. But wa also found messages with new line character somewhere in the middle.
The problem with new line character is when you want to save the messages into one file line by line. The new line character breaks the line and makes the message invalid.
example.proto
syntax = "proto3";
package data_sources;
message StringFirst {
string key = 1;
bool valid = 2;
}
message StringSecond {
bool valid = 1;
string key = 2;
}
example.py
from protocol_buffers.data_sources.example_pb2 import StringFirst, StringSecond
print(StringFirst(key='some key').SerializeToString())
print(StringSecond(key='some key').SerializeToString())
output
b'\n\x08some key'
b'\x12\x08some key'
Is this expected / desired behaviour? How can one prevent the new line character?
protobuf is a binary protocol (unless you're talking about the optional json thing). So: any time you're treating it as text-like in any way, you're using it wrong and the behaviour will be undefined. This includes worrying about whether there are CR/LF characters, but it also includes things like the nul-character (0x00), which is often interpreted as end-of-string in text-based APIs in many frameworks (in particular, C-strings).
Specifically:
bytes)So: again - if the inclusion of "special" text characters is problematic: you're using it wrong.
The most common way to handle binary data as text is to use a base-N encode; base-16 (hex) is convenient to display and read, but base-64 is more efficient in terms of the number of characters required to convey the same number of bytes. So if possible: convert to/from base-64 as required. Base-64 never includes any of the non-printable characters, so you will never encounter CR/LF/nul.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With