Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

design pattern for streaming protoBuf messages

I want to stream protobuf messages onto a file.

I have a protobuf message

message car {
     ... // some fields
}

My java code would create multiple objects of this car message.

How should I stream these messages onto a file.

As far as I know there are 2 ways of going about it.

  1. Have another message like cars

    message cars {
      repeated car c = 1;
    }
    

    and make the java code create a single cars type object and then stream it to a file.

  2. Just stream the car messages onto a single file appropriately using the writeDelimitedTo function.

I am wondering which is the more efficient way to go about streaming using protobuf.

When should I use pattern 1 and when should I be using pattern 2?

This is what I got from https://developers.google.com/protocol-buffers/docs/techniques#large-data

I am not clear on what they are trying to say.

Large Data Sets

Protocol Buffers are not designed to handle large messages. As a general rule of thumb, if you are dealing in messages larger than a megabyte each, it may be time to consider an alternate strategy.

That said, Protocol Buffers are great for handling individual messages within a large data set. Usually, large data sets are really just a collection of small pieces, where each small piece may be a structured piece of data. Even though Protocol Buffers cannot handle the entire set at once, using Protocol Buffers to encode each piece greatly simplifies your problem: now all you need is to handle a set of byte strings rather than a set of structures.

Protocol Buffers do not include any built-in support for large data sets because different situations call for different solutions. Sometimes a simple list of records will do while other times you may want something more like a database. Each solution should be developed as a separate library, so that only those who need it need to pay the costs.

like image 383
Varun Tulsian Avatar asked Nov 12 '22 18:11

Varun Tulsian


1 Answers

Have a look at Previous Question. Any difference in size and time will be minimal (option 1 faster ??, option 2 smaller).

My advice would be:

  1. Option 2 for big files. You process message by message.
  2. Option 1 if multiple languages are need. In the past, delimited was not supported in all languages, this seems to be changing though.
  3. Other wise personel preferrence.
like image 173
Bruce Martin Avatar answered Nov 15 '22 03:11

Bruce Martin