Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to implement VARIANT in Protobuf

As part of my protobuf protocol I require the ability to send data of a dynamic type, a little bit like VARIANT. Roughly I require the data to be an integer, string, boolean or "other" where "other" (e.g. DateTime) is serialized as a string. I need to be able to use these as a single field and in lists in a number of different locations in the protocol.

How can this best be implemented while keeping message size minimal and performance optimal?

I'm using protobuf-net with C#.

EDIT:
I've posted a proposed answer below which uses what I think is the minimum of memory required.

EDIT2:
Created a github.com project at http://github.com/pvginkel/ProtoVariant with a complete implementation.

like image 891
Pieter van Ginkel Avatar asked Jun 29 '11 10:06

Pieter van Ginkel


People also ask

Can proto3 import proto2?

It's possible to import proto3 message types and use them in your proto2 messages, and vice versa. However, proto2 enums cannot be used in proto3 syntax.

How do you use any in protobuf?

To use the Any type, you must import the google/protobuf/any. proto definition. In the C# code, the Any class provides methods for setting the field, extracting the message, and checking the type. Protobuf's internal reflection code uses the Descriptor static field on each generated type to resolve Any field types.

How do I set default value in protobuf?

For bool s, the default value is false. For numeric types, the default value is zero. For enums , the default value is the first value listed in the enum's type definition. This means care must be taken when adding a value to the beginning of an enum value list.


1 Answers

Jon's multiple optionals covers the simplest setup, especially if you need cross-platform support. On the .NET side (to ensure you don't serialize unnecessary values), simply return null from any property that isn't a match, for example:

public object Value { get;set;}
[ProtoMember(1)]
public int? ValueInt32 {
    get { return (Value is int) ? (int)Value : (int?)null; }
    set { Value = value; }
}
[ProtoMember(2)]
public string ValueString {
    get { return (Value is string) ? (string)Value : null; }
    set { Value = value; }
}
// etc

You can also do the same using the bool ShouldSerialize*() pattern if you don't like the nulls.

Wrap that up in a class and you should be fine to use that at either the field level or list level. You mention optimal performance; the only additional thing I can suggest there is to perhaps consider treating as a "group" rather than "submessage", as this is easier to encode (and just as easy to decode, as long as you expect the data). To do that, use the Grouped data-format, via [ProtoMember], i.e.

[ProtoMember(12, DataFormat = DataFormat.Group)]
public MyVariant Foo {get;set;}

However, the difference here can be minimal - but it avoids some back-tracking in the output stream to fix the lengths. Either way, in terms of overheads a "submessage" will take at least 2 bytes; "at least one" for the field-header (perhaps taking more if the 12 is actually 1234567) - and "at least one" for the length, which gets bigger for longer messages. A group takes 2 x the field-header, so if you use low field-numbers this will be 2 bytes regardless of the length of the encapsulated data (it could be 5MB of binary).

A separate trick, useful for more complex scenarios but not as interoperable, is generic inheritance, i.e. an abstract base class that has ConcreteType<int>, ConcreteType<string> etc listed as subtypes - this, however, takes an extra 2 bytes (typically), so is not as frugal.

Taking another step further away from the core spec, if you genuinely can't tell what types you need to support, and don't need interoperability - there is some support for including (optimized) type information in the data; see the DynamicType option on ProtoMember - this takes more space than the other two options.

like image 136
Marc Gravell Avatar answered Oct 02 '22 23:10

Marc Gravell