Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Google protobuf and large binary blobs

I'm building a software to remotely control radio hardware which is attached to another PC.

I plan to use ZeroMQ for the transport and an RPC-like request-reply with different messages on top of it which represent the operations.

While most of my messages will be just some control and status information, there should be an option to set a blob of data to transmit or to request a blob of data to receive. These data blobs will usually be in the range of 5-10MB but it should be possible to also use larger blobs up to several 100MB.

For the message format, I found the google protocol buffers very appealing because I could define one message type on the transport link which has optional elements for all the commands and responses. However, the protobuf FAQ states that such large messages will negatively impact performance.

So the question is, how bad would it actually be? What negative effects are there to expect? I don't really want to base the whole communications on protobuf only to find out that it doesn't work.

like image 897
jan Avatar asked Mar 10 '14 18:03

jan


People also ask

How large can a protobuf be?

A string cannot exceed 2GB. As described, bytes can store custom data types, up to 2GB in size.

Is protobuf binary format?

Protobuf is a binary message format crafted by Google and is efficient compared to other message formats like JSON & XML.

Does protobuf compress data?

Messages are not compressed.

What is faster than protobuf?

TL;DR — encoding and decoding string-intensive data in JavaScript is faster with JSON than it is with protobuf. When you have structured data in JavaScript, which needs to be sent over the network (for another microservice for example) or saved into a storage system, it first needs to be serialized.


1 Answers

I don't have time to do this for you, but I would browse the Protobuf source code. Better yet, go ahead and write your code using a large bytes field, build protobuf from source, and step through it in a debugger to see what happens when you send and receive large blobs.

From experience, I can tell you that large repeated Message fields are not efficient unless they have the [packed=true] attribute, but that only works for primitive types.

My gut feeling is that large bytes fields will be efficient, but this is totally unsubstantiated.

You could also bypass Protobuf for your large blobs:

message BlobInfo {
    required fixed64 size;
    ...
}

message MainFormat {
    ...
    optional BlobInfo blob;
}

then your parsing code looks like:

...
if (message.has_blob()) {
    uint64_t size = msg.blob()->size();
    zmqsock.recv(blob_buffer, size);
}
like image 158
japreiss Avatar answered Oct 08 '22 16:10

japreiss