a colleague of mine came with an idea of generating protocol buffers classes at runtime. Meaning: <ul> <li>There is C++ server application and Java client application communicating over TCP/IP via protocol buffers messages.</li> <li>The C++ application may have different schema in different versions and this is not necessarily backward compatible</li> <li>There is Java application communicating with this server which should support all possible server versions.</li> </ul> The idea is that the server sends the protocol buffer's definition as part of the initial handshake and the java application generates the class at runtime and use it for communication with the server. I wonder whether this is even vital idea and if there is possibly some utility for such use case. Thanks

What you describe is actually already supported by the Protocol Buffers implementations in C++ and Java. All you have to do is transmit a <code>FileDescriptorSet</code> (as defined in <code>google/protobuf/descriptor.proto</code>) containing the <code>FileDescriptorProto</code>s representing each relevant <code>.proto</code> file, then use <code>DynamicMessage</code> to interpret the messages on the receiving end. To get a <code>FileDescriptorProto</code> in C++, given message type <code>Foo</code> that is defined in that file, do: <pre class="prettyprint"><code>google::protobuf::FileDescriptorProto file; Foo::descriptor().file()->CopyTo(&file); </code></pre> Put all the <code>FileDescriptorProto</code>s that define the types you need, plus all the files that they import, into a <code>FileDescriptorSet</code> proto. Note that you can use <code>google::protobuf::FileDescriptor</code> (the thing returned by <code>Foo::descriptor().file()</code>) to iterate over dependencies rather than explicitly name each one. Now, send the <code>FileDescriptorSet</code> to the client. On the client, use <code>FileDescriptor.buildFrom()</code> to convert each <code>FileDescriptorProto</code> to a live <code>Descriptors.FileDescriptor</code>. You will have to make sure to build dependencies before dependents, since you have to provide the already-built dependencies to <code>buildFrom()</code> when building the dependents. From there, you can use the <code>FileDescriptor</code>'s <code>findMessageTypeByName()</code> to find the <code>Descriptor</code> for the specific message type you care about. Finally, you can call <code>DynamicMessage.newBuilder(descriptor)</code> to construct a new builder instance for the type in question. <code>DynamicMessage.Builder</code> implements the <code>Message.Builder</code> interface, which has fields like <code>getField()</code> and <code>setField()</code> to manipulate the fields of the message dynamically (by specifying the corresponding <code>FieldDescriptor</code>s). Similarly, you can call <code>DynamicMessage.parseFrom(descriptor,input)</code> to parse messages received from the server. Note that one disadvantage of <code>DynamicMessage</code> is that it is relatively slow. Essentially, it's like an interpreted language. Generated code is faster because the compiler can optimize for the specific type, whereas <code>DynamicMessage</code> has to be able to handle any type. However, there's really no way around this. Even if you ran the code generator and compiled the class at runtime, the code which actually uses the new class would still be code that you wrote earlier, before you knew what type you were going to use. Therefore, it still has to use a reflection or reflection-like interface to access the message, and that is going to be slower than if the code were hand-written for the specific type. <h3>But is it a good idea?</h3> Well, this depends. What is the client actually going to do with this schema it receives from the server? Transmitting a schema over the wire doesn't magically make the client compatible with that version of the protocol -- the client still has to understand what the protocol means. If the protocol has been changed in a backwards-incompatible way, this almost certainly means that the meaning of the protocol has changed, and the client code has to be updated, schema transmission or not. The only time where you can expect the client to continue working without an update is when the client is only doing a generic operation that only depends on the message content but not the message meaning -- for example, the client could convert the message to JSON without having to know what it means. But this is relatively unusual, particularly on the client end of an application. This is exactly why Protobufs doesn't send any type information by default -- because it's usually useless, since if the receiver doesn't know the meaning, the schema is irrelevant. If the issue is that the server is sending messages to the client which aren't intended to be interpreted at all, but just sent back to the server at a later time, then the client doesn't need the schema at all. Just transmit the message as <code>bytes</code> and don't bother parsing it. Note that a <code>bytes</code> field containing an encoded message of type <code>Foo</code> looks exactly the same on the wire as a field whose type is actually declared as <code>Foo</code>. You could actually compile the client and server against slightly different versions of the <code>.proto</code> file, where the client sees a particular field as <code>bytes</code> while the server sees it as a sub-message, in order to avoid the need for the client to be aware of the definition of that sub-message. ``

Protocol buffer objects generated at runtime

a colleague of mine came with an idea of generating protocol buffers classes at runtime. Meaning:

There is C++ server application and Java client application communicating over TCP/IP via protocol buffers messages.
The C++ application may have different schema in different versions and this is not necessarily backward compatible
There is Java application communicating with this server which should support all possible server versions.

The idea is that the server sends the protocol buffer's definition as part of the initial handshake and the java application generates the class at runtime and use it for communication with the server.

I wonder whether this is even vital idea and if there is possibly some utility for such use case.

Thanks

How does protocol buffer work?

Protocol buffers are a combination of the definition language (created in . proto files), the code that the proto compiler generates to interface with data, language-specific runtime libraries, and the serialization format for data that is written to a file (or sent across a network connection).

What is protocol buffer in gRPC?

Protocol Buffer, a.k.a. Protobuf Protobuf is the most commonly used IDL (Interface Definition Language) for gRPC. It's where you basically store your data and function contracts in the form of a proto file.

What is protobuf object?

Protocol buffers, or Protobuf, is a binary format created by Google to serialize data between different services. Google made this protocol open source and now it provides support, out of the box, to the most common languages, like JavaScript, Java, C#, Ruby and others.

Are Protocol buffers faster than JSON?

In one – protobufjs was faster, and in the second — JSON was faster. Looking at the schemas, the immediate suspect was the number of strings. We ran the benchmark with this payload (10,000 strings, of length 10 each).

What you describe is actually already supported by the Protocol Buffers implementations in C++ and Java. All you have to do is transmit a FileDescriptorSet (as defined in google/protobuf/descriptor.proto) containing the FileDescriptorProtos representing each relevant .proto file, then use DynamicMessage to interpret the messages on the receiving end.

To get a FileDescriptorProto in C++, given message type Foo that is defined in that file, do:

google::protobuf::FileDescriptorProto file;
Foo::descriptor().file()->CopyTo(&file);

Put all the FileDescriptorProtos that define the types you need, plus all the files that they import, into a FileDescriptorSet proto. Note that you can use google::protobuf::FileDescriptor (the thing returned by Foo::descriptor().file()) to iterate over dependencies rather than explicitly name each one.

Now, send the FileDescriptorSet to the client.

On the client, use FileDescriptor.buildFrom() to convert each FileDescriptorProto to a live Descriptors.FileDescriptor. You will have to make sure to build dependencies before dependents, since you have to provide the already-built dependencies to buildFrom() when building the dependents.

From there, you can use the FileDescriptor's findMessageTypeByName() to find the Descriptor for the specific message type you care about.

Finally, you can call DynamicMessage.newBuilder(descriptor) to construct a new builder instance for the type in question. DynamicMessage.Builder implements the Message.Builder interface, which has fields like getField() and setField() to manipulate the fields of the message dynamically (by specifying the corresponding FieldDescriptors).

Similarly, you can call DynamicMessage.parseFrom(descriptor,input) to parse messages received from the server.

Note that one disadvantage of DynamicMessage is that it is relatively slow. Essentially, it's like an interpreted language. Generated code is faster because the compiler can optimize for the specific type, whereas DynamicMessage has to be able to handle any type.

However, there's really no way around this. Even if you ran the code generator and compiled the class at runtime, the code which actually uses the new class would still be code that you wrote earlier, before you knew what type you were going to use. Therefore, it still has to use a reflection or reflection-like interface to access the message, and that is going to be slower than if the code were hand-written for the specific type.

But is it a good idea?

Well, this depends. What is the client actually going to do with this schema it receives from the server? Transmitting a schema over the wire doesn't magically make the client compatible with that version of the protocol -- the client still has to understand what the protocol means. If the protocol has been changed in a backwards-incompatible way, this almost certainly means that the meaning of the protocol has changed, and the client code has to be updated, schema transmission or not. The only time where you can expect the client to continue working without an update is when the client is only doing a generic operation that only depends on the message content but not the message meaning -- for example, the client could convert the message to JSON without having to know what it means. But this is relatively unusual, particularly on the client end of an application. This is exactly why Protobufs doesn't send any type information by default -- because it's usually useless, since if the receiver doesn't know the meaning, the schema is irrelevant.

If the issue is that the server is sending messages to the client which aren't intended to be interpreted at all, but just sent back to the server at a later time, then the client doesn't need the schema at all. Just transmit the message as bytes and don't bother parsing it. Note that a bytes field containing an encoded message of type Foo looks exactly the same on the wire as a field whose type is actually declared as Foo. You could actually compile the client and server against slightly different versions of the .proto file, where the client sees a particular field as bytes while the server sees it as a sub-message, in order to avoid the need for the client to be aware of the definition of that sub-message. ``

Protocol buffer objects generated at runtime

Tags:

java

protocol-buffers

Jan Zyka

People also ask

1 Answers

But is it a good idea?

Kenton Varda

Recent Activity

Donate For Us

Protocol buffer objects generated at runtime

Tags:

java

protocol-buffers

Jan Zyka

People also ask

1 Answers

But is it a good idea?

Kenton Varda

Related questions

Recent Activity

Donate For Us