Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I read binary C++ protobuf data using Python protobuf?

The Python version of Google protobuf gives us only:

SerializeAsString()

Where as the C++ version gives us both:

SerializeToArray(...)
SerializeAsString()

We're writing to our C++ file in binary format, and we'd like to keep it this way. That said, is there a way of reading the binary data into Python and parsing it as if it were a string?

Is this the correct way of doing it?

binary = get_binary_data()
binary_size = get_binary_size()

string = None
for i in range(len(binary_size)):
   string += i

message = new MyMessage()
message.ParseFromString(string)

Update:

Here's a new example, and a problem:

message_length = 512

file = open('foobars.bin', 'rb')

eof = False
while not eof:

    data = file.read(message_length)
    eof = not data

    if not eof:
        foo_bar = FooBar()
        foo_bar.ParseFromString(data)

When we get to the foo_bar.ParseFromString(data) line, I get this error:

Exception Type: DecodeError
Exception Value: Too many bytes when decoding varint.

Update 2:

It turns out, that the padding on the binary data was throwing protobuf off; too many bytes were being sent in, as the message suggests (in this case it was referring to the padding).

This padding comes from using the C++ protobuf function, SerializeToArray on a fixed-length buffer. To eliminate this, I have used this temproary code:

message_length = 512

file = open('foobars.bin', 'rb')

eof = False
while not eof:

    data = file.read(message_length)
    eof = not data

    string = ''
    for i in range(0, len(data)):
        byte = data[i]
        if byte != '\xcc': # yuck!
            string += data[i]

    if not eof:
        foo_bar = FooBar()
        foo_bar.ParseFromString(string)

There is a design flaw here I think. I will re-implement my C++ code so that it writes variable length arrays to the binary file. As advised by the protobuf documentation, I will prefix each message with it's binary size so that I know how much to read when I'm opening the file with Python.

like image 739
Nick Bolton Avatar asked Dec 07 '09 14:12

Nick Bolton


2 Answers

I'm not an expert with Python, but you can pass the result of a file.read() operation into message.ParseFromString(...) without having to build a new string type or anything.

like image 88
Mike Weller Avatar answered Nov 08 '22 16:11

Mike Weller


Python strings can contain any character, i.e. they are capable of holding "binary" data directly. There should be no need to convert from string to "binary".

like image 28
unwind Avatar answered Nov 08 '22 16:11

unwind