I have repeating messages which I want to store in a single file. Currently I have to wrap this repeating message in another message. Is there a way around this? <pre class="prettyprint"><code>package foo; message Box { required int32 tl_x = 1; required int32 tl_y = 2; required int32 w = 3; required int32 h = 4; } message Boxes { repeated Box boxes = 1; } </code></pre>

Protobuf doesn't support this functionality. It can be used to just serialize one message, but this serialized message doesn't contain information about its type (Box or Boxes) and length. So if you want to store multiple message you have to include type and length of message as well. Writing algorithm (in pseudo language) could look like this: <pre class="prettyprint"><code>for every message { write(type_of_message) // 1 byte long write(length_of_serialized_message) // 4 bytes long write(serialized_message) } </code></pre> Load algorithm: <pre class="prettyprint"><code>while(end_of_file) { type = read(1) // 1 byte length = read(4) // 4 bytes buffer = read(length) switch (type) { case 1: deserialise_message_1(buffer) case 2: deserialise_message_2(buffer) } } </code></pre>

Storing multiple messages in one protocol buffer binary file

Tags:

c++

python

protocol-buffers

I have repeating messages which I want to store in a single file. Currently I have to wrap this repeating message in another message. Is there a way around this?

package foo;

message Box {
  required int32 tl_x = 1;
  required int32 tl_y = 2;
  required int32 w = 3;
  required int32 h = 4;
}

message Boxes {
  repeated Box boxes = 1;
}

551

asked Apr 07 '11 19:04

Dat Chu

3 Answers

Here's what "Techniques" section of the Protocol Buffers documentation says about repeated messages:

If you want to write multiple messages to a single file or stream, it is up to you to keep track of where one message ends and the next begins. The Protocol Buffer wire format is not self-delimiting, so protocol buffer parsers cannot determine where a message ends on their own. The easiest way to solve this problem is to write the size of each message before you write the message itself. When you read the messages back in, you read the size, then read the bytes into a separate buffer, then parse from that buffer. (If you want to avoid copying bytes to a separate buffer, check out the CodedInputStream class (in both C++ and Java) which can be told to limit reads to a certain number of bytes.)

There's also a conventional way of implementing this in C++ and Java. Take a look at this Stack Overflow thread for details: Are there C++ equivalents for the Protocol Buffers delimited I/O functions in Java?

answered Sep 21 '22 15:09

alavrik

Protobuf doesn't support this functionality. It can be used to just serialize one message, but this serialized message doesn't contain information about its type (Box or Boxes) and length. So if you want to store multiple message you have to include type and length of message as well. Writing algorithm (in pseudo language) could look like this:

for every message {
    write(type_of_message) // 1 byte long
    write(length_of_serialized_message) // 4 bytes long
    write(serialized_message)
}

Load algorithm:

while(end_of_file) {

    type = read(1) // 1 byte
    length = read(4) // 4 bytes
    buffer = read(length)
    switch (type) {
      case 1:
         deserialise_message_1(buffer)
      case 2:
         deserialise_message_2(buffer)
    }
}

answered Sep 20 '22 15:09

Zuljin

I was just working on this problem and ended up going with Parquet. Parquet works perfectly for storing a bunch of Protobuf messages in a file and makes it easier to work with them later on.

This bit of code will create the Parquet file:

Path path = new Path("/tmp/mydata.parq");
CompressionCodecName codecName = CompressionCodecName.SNAPPY;
int blockSize = 134217728;
int pageSize = 1048576;
boolean enableDictionary = true;
boolean validating = false;

ProtoParquetWriter<Message> writer
    = new ProtoParquetWriter<>(
        path,
        Box.class,
        codecName,
        blockSize,
        pageSize,
        enableDictionary,
        validating
    );

for (Message message : messages) {
    writer.write(message);
}

writer.close();

It might not suit your use case but I thought that it was worth a mention here.

answered Sep 20 '22 15:09

Collin Krawll

Related questions
                            
                                Wrapping dynamic array into STL/Boost container?
                            
                                Assisting in avoiding assert... always!
                            
                                about const member function [duplicate]
                            
                                Exception handling before and after main
                            
                                Why does my simple C++ GUI application show a message box in Chinese?
                            
                                Why can't for_each modify its functor argument?
                            
                                C++ equivalent for memset on char*
                            
                                how c++ implements the polymorphism internally?
                            
                                Float Values as an index in an Array in C++
                            
                                Area of a irregular shape
                            
                                Is there a way to specify the dimensions of a nested STL vector C++?
                            
                                What benefit is there of allowing a variable to be left uninitialized?
                            
                                Why are templates so slow to compile?
                            
                                Rounding off floats with ostringstream
                            
                                Does const help the optimizer? C++ [duplicate]
                            
                                Use std::sort to find top N items in a std::vector
                            
                                Error C2558 - copy constructor
                            
                                Stack around the variable ' ' was corrupted
                            
                                Different behavior of shift operator with -O2 and without
                            
                                about "int const *p" and "const int *p "

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With